diff --git "a/wandb/run-20220325_193848-1sz5964i/files/output.log" "b/wandb/run-20220325_193848-1sz5964i/files/output.log" new file mode 100644--- /dev/null +++ "b/wandb/run-20220325_193848-1sz5964i/files/output.log" @@ -0,0 +1,6270 @@ + + 0%| | 0/1115 [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:38:51,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:38:53,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:38:53,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:38:55,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:38:55,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:38:57,030 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:38:57,693 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:38:58,947 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:38:59,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:00,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:01,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:02,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:03,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:04,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:05,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:06,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:07,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:08,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:09,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:10,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:11,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:12,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:12,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:15,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:15,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:17,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:17,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:18,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:19,516 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:20,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:21,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%| | 1/1115 [00:32<9:57:12, 32.17s/it] + 0%| | 1/1115 [00:32<9:57:12, 32.17s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:39:22,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:23,265 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:24,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:25,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:26,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:27,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:28,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:28,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:30,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:30,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:31,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:32,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:33,801 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:34,442 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:35,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:36,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:37,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:38,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:39,298 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:39,935 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:41,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:41,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:42,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:43,605 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:44,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:46,611 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:47,248 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:48,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:49,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:50,270 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 8.857, 'learning_rate': 0.0, 'epoch': 0.01} +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:50,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 2/1115 [01:01<9:27:39, 30.60s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:39:52,184 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:52,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:54,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:54,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:55,823 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:56,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:57,628 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:39:58,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:39:59,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:00,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:01,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:01,906 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:03,094 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:03,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:04,896 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:05,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:06,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:07,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:08,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:09,134 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:10,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:10,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:12,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:12,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:13,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:14,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:15,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:16,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:17,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:18,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:19,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 8.7704, 'learning_rate': 6e-07, 'epoch': 0.01} +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:19,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 3/1115 [01:30<9:15:04, 29.95s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:40:21,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:21,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:23,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:23,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:24,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:25,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:26,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:27,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:28,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:29,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:30,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:30,814 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:31,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:32,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:33,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:34,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:35,513 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:36,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:37,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:37,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:39,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:39,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:40,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:41,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:42,620 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:43,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:44,403 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:45,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:46,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:46,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:47,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 8.6839, 'learning_rate': 1.2e-06, 'epoch': 0.02} +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:48,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 4/1115 [01:59<9:04:22, 29.40s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:40:49,823 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:50,425 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:51,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:52,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:53,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:53,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:55,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:55,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:56,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:57,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:40:58,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:40:59,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:00,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:41:01,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:02,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:41:02,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:03,918 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:41:04,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:05,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:41:06,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:07,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:41:08,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:09,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:41:09,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:10,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:41:11,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:12,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:41:13,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:14,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:41:15,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:16,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:41:16,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 5/1115 [02:27<8:56:05, 28.98s/it] + 0%|▎ | 5/1115 [02:27<8:56:05, 28.98s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:41:18,648 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 5/1115 [02:27<8:56:05, 28.98s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:21,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:25,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:25,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:28,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:28,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:32,052 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:35,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:35,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:39,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:39,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:42,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:42,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▍ | 6/1115 [02:55<8:50:08, 28.68s/it] Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▍ | 6/1115 [02:55<8:50:08, 28.68s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:49,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:49,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:49,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:54,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:57,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:41:57,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:01,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:01,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:04,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:08,004 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:08,004 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:11,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:11,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▌ | 7/1115 [03:24<8:50:41, 28.74s/it] Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▌ | 7/1115 [03:24<8:50:41, 28.74s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:18,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:18,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:22,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:22,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:25,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:28,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:28,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:32,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:32,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:35,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:39,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:39,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:39,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▌ | 8/1115 [03:52<8:44:55, 28.45s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▌ | 8/1115 [03:52<8:44:55, 28.45s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:46,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:46,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:49,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:53,206 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:53,206 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:56,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:42:56,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:00,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:03,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:03,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:07,004 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▋ | 9/1115 [04:20<8:40:21, 28.23s/it] Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▋ | 9/1115 [04:20<8:40:21, 28.23s/it] Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▋ | 9/1115 [04:20<8:40:21, 28.23s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:14,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:14,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 7.4337, 'learning_rate': 4.8e-06, 'epoch': 0.04} +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 7.22, 'learning_rate': 5.399999999999999e-06, 'epoch': 0.05} +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.9989, 'learning_rate': 5.999999999999999e-06, 'epoch': 0.05} +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.8282, 'learning_rate': 6.599999999999999e-06, 'epoch': 0.06} +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.6831, 'learning_rate': 7.2e-06, 'epoch': 0.06} +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.4666, 'learning_rate': 7.799999999999998e-06, 'epoch': 0.07} +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.2614, 'learning_rate': 8.4e-06, 'epoch': 0.07} +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.0435, 'learning_rate': 9.6e-06, 'epoch': 0.08} + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.8492, 'learning_rate': 1.02e-05, 'epoch': 0.09} + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.8448, 'learning_rate': 1.0799999999999998e-05, 'epoch': 0.09} + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.6986, 'learning_rate': 1.14e-05, 'epoch': 0.09} + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.673, 'learning_rate': 1.1999999999999999e-05, 'epoch': 0.1} + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.4552, 'learning_rate': 1.26e-05, 'epoch': 0.1} + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.4442, 'learning_rate': 1.3199999999999997e-05, 'epoch': 0.11} + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2917, 'learning_rate': 1.3799999999999998e-05, 'epoch': 0.11} + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:50:29,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:50:29,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:50:29,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.29, 'learning_rate': 1.44e-05, 'epoch': 0.12} + 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1356, 'learning_rate': 1.4999999999999999e-05, 'epoch': 0.12} + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1709, 'learning_rate': 1.5599999999999996e-05, 'epoch': 0.13} + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9962, 'learning_rate': 1.6199999999999997e-05, 'epoch': 0.13} + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:37,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:37,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:37,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:37,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:37,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:37,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:37,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:37,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:53,665 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:53,665 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9779, 'learning_rate': 1.7999999999999997e-05, 'epoch': 0.14} +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9887, 'learning_rate': 1.8599999999999998e-05, 'epoch': 0.15} +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:53:42,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:53:42,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9495, 'learning_rate': 1.92e-05, 'epoch': 0.15} +[WARNING|modeling_utils.py:388] 2022-03-25 19:53:47,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:53:47,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:53:47,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:53:53,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:53:53,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:53:53,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:53:59,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:53:59,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:03,224 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:03,224 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8838, 'learning_rate': 1.98e-05, 'epoch': 0.16} +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:07,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:07,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:07,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:07,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:15,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:15,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:15,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:21,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:21,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:21,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:21,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:54:28,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:54:28,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:54:28,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:33,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:33,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:33,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:40,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:40,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:40,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:40,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:40,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9915, 'learning_rate': 2.1e-05, 'epoch': 0.17} +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:50,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:50,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:50,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:50,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:50,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:54:50,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:55:02,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:55:02,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:55:02,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:55:02,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:55:02,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8265, 'learning_rate': 2.1599999999999996e-05, 'epoch': 0.17} +[WARNING|modeling_utils.py:388] 2022-03-25 19:55:12,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:55:12,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:55:17,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:55:19,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:55:19,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:55:19,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:55:25,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▊ | 39/1115 [16:37<6:14:41, 20.89s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▊ | 39/1115 [16:37<6:14:41, 20.89s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9207, 'learning_rate': 2.2199999999999998e-05, 'epoch': 0.17} + 3%|██▊ | 39/1115 [16:37<6:14:41, 20.89s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▊ | 39/1115 [16:37<6:14:41, 20.89s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:55:35,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:55:37,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:55:37,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:55:41,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:55:41,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:55:41,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:55:45,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:55:45,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:55:49,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:55:51,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:55:54,128 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:55:56,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:55:58,513 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:56:00,611 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:56:02,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 19:56:02,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8838, 'learning_rate': 2.34e-05, 'epoch': 0.18} +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:06,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:08,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:10,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:12,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:14,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:16,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:18,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:18,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:20,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:22,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:24,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:26,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:27,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:29,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:31,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:31,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:33,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:35,280 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:38,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:40,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:42,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:43,752 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:46,162 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:47,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:47,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:49,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:52,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:54,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:57,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:56:58,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:00,203 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:00,203 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:03,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:04,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:07,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:08,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:09,945 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:09,945 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:12,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:15,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:16,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:18,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:20,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:20,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:23,228 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:25,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:27,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:29,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:29,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:31,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:33,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:34,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:36,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:36,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:39,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:40,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:43,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:43,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6302, 'learning_rate': 2.88e-05, 'epoch': 0.22} +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:46,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:46,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:50,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:50,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:54,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:54,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:57:57,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:01,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:01,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:05,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:05,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:08,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:08,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:12,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:12,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.4289, 'learning_rate': 2.94e-05, 'epoch': 0.23} +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:16,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:19,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:19,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:23,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:23,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:27,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:27,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:30,752 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:34,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:34,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:37,897 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:37,897 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:41,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:41,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.2769, 'learning_rate': 2.9999999999999997e-05, 'epoch': 0.23} +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:45,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:48,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:48,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:52,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:52,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:55,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:55,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:58:59,516 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:03,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:03,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:06,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:06,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:10,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:10,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:10,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:13,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:17,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:17,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:20,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:20,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.4623, 'learning_rate': 3.119999999999999e-05, 'epoch': 0.24} +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███��� | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.866, 'learning_rate': 3.2399999999999995e-05, 'epoch': 0.25} + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8544, 'learning_rate': 3.2999999999999996e-05, 'epoch': 0.26} + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8141, 'learning_rate': 3.36e-05, 'epoch': 0.26} + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8047, 'learning_rate': 3.42e-05, 'epoch': 0.26} + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8444, 'learning_rate': 3.48e-05, 'epoch': 0.27} + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8777, 'learning_rate': 3.539999999999999e-05, 'epoch': 0.27} + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7374, 'learning_rate': 3.5999999999999994e-05, 'epoch': 0.28} + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8447, 'learning_rate': 3.6599999999999995e-05, 'epoch': 0.28} + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7271, 'learning_rate': 3.7199999999999996e-05, 'epoch': 0.29} + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.691, 'learning_rate': 3.78e-05, 'epoch': 0.29} + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7024, 'learning_rate': 3.84e-05, 'epoch': 0.3} + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6623, 'learning_rate': 3.9e-05, 'epoch': 0.3} + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6917, 'learning_rate': 3.96e-05, 'epoch': 0.3} + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7466, 'learning_rate': 4.02e-05, 'epoch': 0.31} + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6977, 'learning_rate': 4.08e-05, 'epoch': 0.31} + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6366, 'learning_rate': 4.14e-05, 'epoch': 0.32} + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6913, 'learning_rate': 4.2e-05, 'epoch': 0.32} + 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▏ | 73/1115 [29:15<7:19:23, 25.30s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▏ | 73/1115 [29:15<7:19:23, 25.30s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5726, 'learning_rate': 4.259999999999999e-05, 'epoch': 0.33} + 7%|█████▏ | 73/1115 [29:15<7:19:23, 25.30s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▏ | 73/1115 [29:15<7:19:23, 25.30s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▏ | 73/1115 [29:15<7:19:23, 25.30s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6529, 'learning_rate': 4.319999999999999e-05, 'epoch': 0.33} +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7602, 'learning_rate': 4.3799999999999994e-05, 'epoch': 0.34} +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6657, 'learning_rate': 4.4399999999999995e-05, 'epoch': 0.34} +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5708, 'learning_rate': 4.4999999999999996e-05, 'epoch': 0.35} +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:09:54,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:09:54,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:09:54,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:09:54,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:09:54,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:09:54,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:09:54,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5891, 'learning_rate': 4.62e-05, 'epoch': 0.35} + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.695, 'learning_rate': 4.68e-05, 'epoch': 0.36} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5544, 'learning_rate': 4.7399999999999993e-05, 'epoch': 0.36} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6184, 'learning_rate': 4.7999999999999994e-05, 'epoch': 0.37} +[WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:11:59,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:11:59,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:11:59,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6035, 'learning_rate': 4.8599999999999995e-05, 'epoch': 0.37} +[WARNING|modeling_utils.py:388] 2022-03-25 20:11:59,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:07,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:07,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6037, 'learning_rate': 4.9199999999999997e-05, 'epoch': 0.38} +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:34,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:34,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:34,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:34,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:34,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:34,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6785, 'learning_rate': 4.98e-05, 'epoch': 0.38} +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:34,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:48,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:48,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:48,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:48,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:48,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:12:48,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:13:00,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:13:00,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:13:00,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4987, 'learning_rate': 5.04e-05, 'epoch': 0.39} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:13:00,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:13:00,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:13:11,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:13:11,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:13:11,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:13:11,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:13:19,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:13:19,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:13:19,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:13:19,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:13:25,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:13:25,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:13:25,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:13:31,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:13:31,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:13:31,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:13:31,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:13:31,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:13:41,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:13:43,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:13:43,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5589, 'learning_rate': 5.1599999999999994e-05, 'epoch': 0.39} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:13:47,642 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:13:49,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:13:49,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:13:53,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:13:56,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:13:56,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:13:56,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:14:02,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:14:02,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:14:02,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6228, 'learning_rate': 5.2199999999999995e-05, 'epoch': 0.4} +[WARNING|modeling_utils.py:388] 2022-03-25 20:14:07,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:14:10,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:14:10,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:14,230 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:16,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:16,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:14:20,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:14:22,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:14:22,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:14:24,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:14:26,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:14:26,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:30,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:32,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:34,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:36,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:36,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▍ | 91/1115 [35:49<5:21:59, 18.87s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:41,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:43,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:45,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:47,014 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:48,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:50,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:52,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:52,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▌ | 92/1115 [36:04<5:05:50, 17.94s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:14:54,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:56,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:54,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:14:58,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:54,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:00,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:54,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:02,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:54,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:05,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:54,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:07,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:54,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:07,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:54,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▌ | 93/1115 [36:19<4:49:34, 17.00s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:15:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:11,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:13,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:14,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:18,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:18,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:22,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:22,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▋ | 94/1115 [36:34<4:36:05, 16.23s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:15:23,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:25,523 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:23,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:27,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:23,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:30,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:23,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:31,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:23,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:33,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:23,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▋ | 95/1115 [36:46<4:16:12, 15.07s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:15:36,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▋ | 95/1115 [36:46<4:16:12, 15.07s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:15:36,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:37,671 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:36,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:40,437 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:36,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:41,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:36,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:44,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:36,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 96/1115 [36:57<3:54:36, 13.81s/it] Setting `use_cache=False`...1] 2022-03-25 20:15:36,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 96/1115 [36:57<3:54:36, 13.81s/it] Setting `use_cache=False`...1] 2022-03-25 20:15:36,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:48,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:47,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:50,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:47,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:51,920 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:47,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:54,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:47,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 97/1115 [37:06<3:32:50, 12.54s/it] Setting `use_cache=False`...1] 2022-03-25 20:15:47,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▊ | 97/1115 [37:06<3:32:50, 12.54s/it] Setting `use_cache=False`...1] 2022-03-25 20:15:47,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:57,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:56,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:15:59,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:56,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:01,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:56,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▉ | 98/1115 [37:15<3:10:48, 11.26s/it] Setting `use_cache=False`...1] 2022-03-25 20:15:56,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▉ | 98/1115 [37:15<3:10:48, 11.26s/it] Setting `use_cache=False`...1] 2022-03-25 20:15:56,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:06,629 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:04,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:08,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:04,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:10,220 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:04,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 99/1115 [37:22<2:50:40, 10.08s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:16:12,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 99/1115 [37:22<2:50:40, 10.08s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:16:12,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:14,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:12,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:16,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:12,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▉ | 100/1115 [37:29<2:35:08, 9.17s/it] Setting `use_cache=False`...1] 2022-03-25 20:16:12,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▉ | 100/1115 [37:29<2:35:08, 9.17s/it] Setting `use_cache=False`...1] 2022-03-25 20:16:12,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▉ | 100/1115 [37:29<2:35:08, 9.17s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|██████▉ | 100/1115 [37:29<2:35:08, 9.17s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:23,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:23,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:27,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:31,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:31,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:34,983 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:34,983 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:38,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:38,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:42,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:45,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:45,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 101/1115 [37:59<4:18:26, 15.29s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████ | 101/1115 [37:59<4:18:26, 15.29s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.9227, 'learning_rate': 5.94e-05, 'epoch': 0.45} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:53,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:53,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:56,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:16:56,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:00,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:07,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:07,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:11,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:11,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:14,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▏ | 102/1115 [38:27<5:25:57, 19.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▏ | 102/1115 [38:27<5:25:57, 19.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.8449, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.46} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:21,842 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:21,842 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:25,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:25,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:28,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:32,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:32,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:36,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:36,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:39,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:39,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:43,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:43,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▏ | 103/1115 [38:56<6:11:14, 22.01s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▏ | 103/1115 [38:56<6:11:14, 22.01s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:50,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:50,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:53,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:57,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:17:57,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:18:00,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:18:00,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:18:03,995 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:18:07,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:18:07,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:18:07,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1179, 'learning_rate': 6.12e-05, 'epoch': 0.47} + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9785, 'learning_rate': 6.18e-05, 'epoch': 0.47} + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8924, 'learning_rate': 6.239999999999999e-05, 'epoch': 0.48} + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7039, 'learning_rate': 6.299999999999999e-05, 'epoch': 0.48} + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7125, 'learning_rate': 6.359999999999999e-05, 'epoch': 0.48} + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6977, 'learning_rate': 6.419999999999999e-05, 'epoch': 0.49} + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6824, 'learning_rate': 6.479999999999999e-05, 'epoch': 0.49} + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6191, 'learning_rate': 6.539999999999999e-05, 'epoch': 0.5} + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6363, 'learning_rate': 6.599999999999999e-05, 'epoch': 0.5} + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6369, 'learning_rate': 6.659999999999999e-05, 'epoch': 0.51} + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5246, 'learning_rate': 6.72e-05, 'epoch': 0.51} + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5943, 'learning_rate': 6.78e-05, 'epoch': 0.52} + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5545, 'learning_rate': 6.84e-05, 'epoch': 0.52} + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|���███████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5898, 'learning_rate': 6.9e-05, 'epoch': 0.52} + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5237, 'learning_rate': 6.96e-05, 'epoch': 0.53} + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5241, 'learning_rate': 7.02e-05, 'epoch': 0.53} + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|█████��██ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5408, 'learning_rate': 7.079999999999999e-05, 'epoch': 0.54} + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4809, 'learning_rate': 7.139999999999999e-05, 'epoch': 0.54} + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5278, 'learning_rate': 7.199999999999999e-05, 'epoch': 0.55} + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4764, 'learning_rate': 7.259999999999999e-05, 'epoch': 0.55} + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4472, 'learning_rate': 7.319999999999999e-05, 'epoch': 0.56} + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5971, 'learning_rate': 7.379999999999999e-05, 'epoch': 0.56} +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4813, 'learning_rate': 7.439999999999999e-05, 'epoch': 0.57} +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5371, 'learning_rate': 7.5e-05, 'epoch': 0.57} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5538, 'learning_rate': 7.56e-05, 'epoch': 0.57} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5222, 'learning_rate': 7.62e-05, 'epoch': 0.58} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:29:15,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:29:15,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:29:15,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5567, 'learning_rate': 7.68e-05, 'epoch': 0.58} +[WARNING|modeling_utils.py:388] 2022-03-25 20:29:15,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:29:15,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:29:15,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:29:28,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:29:28,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:29:28,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:29:34,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:29:34,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:29:34,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 131/1115 [50:50<6:21:26, 23.26s/it]g-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 131/1115 [50:50<6:21:26, 23.26s/it]g-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5519, 'learning_rate': 7.74e-05, 'epoch': 0.59} + 12%|█████████▏ | 131/1115 [50:50<6:21:26, 23.26s/it]g-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 131/1115 [50:50<6:21:26, 23.26s/it]g-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 131/1115 [50:50<6:21:26, 23.26s/it]g-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 131/1115 [50:50<6:21:26, 23.26s/it]g-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 131/1115 [50:50<6:21:26, 23.26s/it]g-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:29:54,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:29:54,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:29:54,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:29:54,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:03,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:03,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3829, 'learning_rate': 7.8e-05, 'epoch': 0.59} +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:03,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:03,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:03,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:03,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:03,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:03,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:19,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:19,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:19,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:19,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▎ | 133/1115 [51:35<6:14:12, 22.86s/it]g-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:27,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:27,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:27,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:27,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:27,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:27,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:27,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:27,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:30:44,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:30:44,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:30:44,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4999, 'learning_rate': 7.92e-05, 'epoch': 0.6} +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5518, 'learning_rate': 7.98e-05, 'epoch': 0.61} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:10,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:10,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:10,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:10,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:10,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:21,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:21,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:21,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:21,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:21,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.48, 'learning_rate': 8.04e-05, 'epoch': 0.61} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:21,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:21,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:35,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:35,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:35,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:35,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:31:43,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:31:43,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:31:43,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:31:43,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5714, 'learning_rate': 8.1e-05, 'epoch': 0.61} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:51,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:51,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:51,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:51,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:51,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:31:51,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:32:03,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:32:03,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:32:03,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 138/1115 [53:19<5:42:12, 21.02s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 138/1115 [53:19<5:42:12, 21.02s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4858, 'learning_rate': 8.16e-05, 'epoch': 0.62} +[WARNING|modeling_utils.py:388] 2022-03-25 20:32:13,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:32:13,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:32:13,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:32:19,685 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:32:22,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:32:22,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:32:26,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 139/1115 [53:38<5:31:59, 20.41s/it] Setting `use_cache=False`...e computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|��████████▋ | 139/1115 [53:38<5:31:59, 20.41s/it] Setting `use_cache=False`...e computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:32:30,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:32:30,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:32:34,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:32:34,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:32:38,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:32:40,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:32:40,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:32:44,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 140/1115 [53:56<5:20:34, 19.73s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 140/1115 [53:56<5:20:34, 19.73s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5127, 'learning_rate': 8.28e-05, 'epoch': 0.63} +[WARNING|modeling_utils.py:388] 2022-03-25 20:32:50,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:32:52,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:32:54,679 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:32:56,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:32:58,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:00,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:03,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:03,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:05,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:07,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:09,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:11,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:12,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:14,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:16,837 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:18,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:18,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:20,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:22,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:24,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:26,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:28,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:29,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:33,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:33,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:35,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:36,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:38,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:40,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:41,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:44,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:44,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:47,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:49,042 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:50,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:53,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:55,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:57,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:57,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:33:59,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:01,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:03,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:04,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:07,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:07,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:09,602 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:10,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:13,280 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:15,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:17,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:17,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:18,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:21,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:23,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:25,228 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:25,228 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:27,134 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:29,964 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:31,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:33,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:33,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:35,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:37,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:39,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:39,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:41,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:41,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:45,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:45,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:49,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:49,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:52,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:52,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:56,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:59,964 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:34:59,964 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:03,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:03,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:07,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:07,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:07,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:10,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:14,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:14,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:18,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:18,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:21,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:25,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:25,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:28,667 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:28,667 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:32,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:32,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:35,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:35,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:35,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:39,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:42,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:42,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:46,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:46,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:49,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:53,222 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:53,222 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:56,667 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:35:56,667 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:00,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:03,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:03,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:07,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:07,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.3625, 'learning_rate': 9.059999999999999e-05, 'epoch': 0.69} +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:10,589 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:14,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:14,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:17,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:17,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:20,814 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:24,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:24,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:27,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:27,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:31,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9468, 'learning_rate': 9.12e-05, 'epoch': 0.69} +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8548, 'learning_rate': 9.18e-05, 'epoch': 0.7} +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7858, 'learning_rate': 9.24e-05, 'epoch': 0.7} +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6516, 'learning_rate': 9.3e-05, 'epoch': 0.7} +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6538, 'learning_rate': 9.36e-05, 'epoch': 0.71} +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5987, 'learning_rate': 9.419999999999999e-05, 'epoch': 0.71} +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6631, 'learning_rate': 9.479999999999999e-05, 'epoch': 0.72} +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5287, 'learning_rate': 9.599999999999999e-05, 'epoch': 0.73} + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5528, 'learning_rate': 9.659999999999999e-05, 'epoch': 0.73} + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5033, 'learning_rate': 9.719999999999999e-05, 'epoch': 0.74} + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5908, 'learning_rate': 9.779999999999999e-05, 'epoch': 0.74} + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4931, 'learning_rate': 9.839999999999999e-05, 'epoch': 0.74} + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.519, 'learning_rate': 9.9e-05, 'epoch': 0.75} + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5015, 'learning_rate': 9.96e-05, 'epoch': 0.75} + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3689, 'learning_rate': 0.0001002, 'epoch': 0.76} + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5586, 'learning_rate': 0.0001008, 'epoch': 0.76} + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4859, 'learning_rate': 0.0001014, 'epoch': 0.77} + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3494, 'learning_rate': 0.000102, 'epoch': 0.77} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4217, 'learning_rate': 0.0001026, 'epoch': 0.78} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3855, 'learning_rate': 0.00010319999999999999, 'epoch': 0.78} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4697, 'learning_rate': 0.00010379999999999999, 'epoch': 0.78} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4196, 'learning_rate': 0.00010439999999999999, 'epoch': 0.79} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:25,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:25,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:25,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:25,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:25,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:25,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:25,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:25,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:03,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:03,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:03,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:03,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:03,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4174, 'learning_rate': 0.00010619999999999998, 'epoch': 0.8} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4003, 'learning_rate': 0.00010679999999999998, 'epoch': 0.81} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5487, 'learning_rate': 0.00010739999999999998, 'epoch': 0.81} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:20,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:20,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:24,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:24,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.35, 'learning_rate': 0.00010799999999999998, 'epoch': 0.82} +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|██████��█████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4523, 'learning_rate': 0.00010919999999999998, 'epoch': 0.83} + 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:22,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:22,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:22,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:22,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:22,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:22,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:22,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:37,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:37,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:37,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:41,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:41,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:41,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:41,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:41,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:51,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:51,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:51,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:51,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:59,420 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:49:59,420 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3919, 'learning_rate': 0.00011039999999999999, 'epoch': 0.83} +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:03,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:03,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:03,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:03,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:03,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:13,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:13,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:13,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:19,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:19,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4118, 'learning_rate': 0.00011099999999999999, 'epoch': 0.84} +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:19,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:26,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:26,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:26,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:26,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:34,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:34,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:34,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:40,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:40,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5025, 'learning_rate': 0.00011159999999999999, 'epoch': 0.84} +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:44,443 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:44,443 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:44,443 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:50,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:52,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:52,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:52,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:58,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:58,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:58,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4032, 'learning_rate': 0.00011219999999999999, 'epoch': 0.85} +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:58,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:50:58,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:51:08,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:51:10,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:51:13,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:51:13,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:51:13,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:51:18,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:51:18,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:51:21,156 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:51:21,156 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:51:25,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:51:27,536 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:51:27,536 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:51:31,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:51:33,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:51:35,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:51:35,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4694, 'learning_rate': 0.00011339999999999999, 'epoch': 0.86} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:51:39,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:51:41,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:51:43,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:51:45,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:51:47,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:51:50,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:51:52,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:51:52,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 192/1115 [1:13:04<4:46:21, 18.61s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:51:56,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:51:58,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:51:59,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:01,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:03,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:05,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:07,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:07,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████▏ | 193/1115 [1:13:19<4:31:24, 17.66s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:52:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:11,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:13,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:16,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:18,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:18,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:22,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:22,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████▏ | 194/1115 [1:13:34<4:18:51, 16.86s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:52:24,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:26,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:24,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:27,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:24,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:30,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:24,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:32,553 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:24,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:34,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:24,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████▎ | 195/1115 [1:13:47<4:00:14, 15.67s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:52:37,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████▎ | 195/1115 [1:13:47<4:00:14, 15.67s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:52:37,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:38,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:37,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:40,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:37,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:41,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:37,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:44,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:37,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:47,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:37,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▎ | 196/1115 [1:13:58<3:40:18, 14.38s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:52:48,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▎ | 196/1115 [1:13:58<3:40:18, 14.38s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:52:48,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:51,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:48,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:52,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:48,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:54,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:48,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:57,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:48,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:52:57,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:48,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▍ | 197/1115 [1:14:08<3:20:17, 13.09s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:52:58,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:00,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:58,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:03,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:58,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:05,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:58,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▍ | 198/1115 [1:14:17<3:00:05, 11.78s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:53:07,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▍ | 198/1115 [1:14:17<3:00:05, 11.78s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:53:07,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:09,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:07,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:11,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:07,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:12,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:07,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:15,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:14,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:15,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:14,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:17,178 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:14,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:19,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:14,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 200/1115 [1:14:32<2:24:20, 9.47s/it] Setting `use_cache=False`...1] 2022-03-25 20:53:14,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 200/1115 [1:14:32<2:24:20, 9.47s/it] Setting `use_cache=False`...1] 2022-03-25 20:53:14,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 200/1115 [1:14:32<2:24:20, 9.47s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 200/1115 [1:14:32<2:24:20, 9.47s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:26,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:26,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:30,373 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:33,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:33,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:37,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:37,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:41,116 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:41,116 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:44,670 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:48,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:48,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:48,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 201/1115 [1:15:01<3:53:55, 15.36s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 201/1115 [1:15:01<3:53:55, 15.36s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:55,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:58,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:53:58,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:02,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:02,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:05,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:09,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:09,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:12,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:12,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:15,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:15,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 202/1115 [1:15:29<4:50:10, 19.07s/it] Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 202/1115 [1:15:29<4:50:10, 19.07s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:22,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:22,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:26,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:26,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:29,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:32,983 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:32,983 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:36,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:39,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:39,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:43,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:43,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 203/1115 [1:15:56<5:26:12, 21.46s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 203/1115 [1:15:56<5:26:12, 21.46s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:49,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:49,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:53,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:53,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:56,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:59,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:54:59,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9813, 'learning_rate': 0.00012119999999999999, 'epoch': 0.91} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7145, 'learning_rate': 0.00012179999999999999, 'epoch': 0.92} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7888, 'learning_rate': 0.0001224, 'epoch': 0.92} +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6485, 'learning_rate': 0.00012299999999999998, 'epoch': 0.93} + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6296, 'learning_rate': 0.0001236, 'epoch': 0.93} + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4968, 'learning_rate': 0.00012419999999999998, 'epoch': 0.94} + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.514, 'learning_rate': 0.00012479999999999997, 'epoch': 0.94} + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:58:05,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 20:58:05,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|█���████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5182, 'learning_rate': 0.00012539999999999999, 'epoch': 0.95} + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.534, 'learning_rate': 0.00012599999999999997, 'epoch': 0.95} + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5318, 'learning_rate': 0.0001266, 'epoch': 0.96} + 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4489, 'learning_rate': 0.00012719999999999997, 'epoch': 0.96} + 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:24,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:24,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:24,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:24,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:24,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:24,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3867, 'learning_rate': 0.0001278, 'epoch': 0.96} +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:01,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:01,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:01,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:05,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:05,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:05,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:05,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:05,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:15,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:15,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:15,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:21,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:21,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:21,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3823, 'learning_rate': 0.000129, 'epoch': 0.97} +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:27,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:27,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:27,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:27,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:00:35,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:00:35,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:39,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:41,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:00:41,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3786, 'learning_rate': 0.00012959999999999998, 'epoch': 0.98} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:00:46,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:00:48,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:00:50,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:00:52,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:00:54,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:00:54,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:00:58,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|██████████████▉ | 219/1115 [1:22:10<4:59:37, 20.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|██████████████▉ | 219/1115 [1:22:10<4:59:37, 20.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:02,247 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:04,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:05,918 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:07,720 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:12,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:12,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|██████████████▉ | 220/1115 [1:22:24<4:34:08, 18.38s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:01:14,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:16,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:14,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:19,273 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:14,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:20,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:14,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:23,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:14,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:24,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:14,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:24,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:14,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:27,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:26,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:29,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:26,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:30,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:26,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:32,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:26,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:32,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:26,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▏ | 222/1115 [1:22:45<3:29:46, 14.09s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:01:34,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:37,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:34,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:39,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:34,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:39,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:34,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▏ | 223/1115 [1:22:51<2:55:41, 11.82s/it] Setting `use_cache=False`...1] 2022-03-25 21:01:34,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▏ | 223/1115 [1:22:51<2:55:41, 11.82s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▏ | 223/1115 [1:22:51<2:55:41, 11.82s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:46,333 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:46,333 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:50,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:53,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:53,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:57,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:01:57,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:00,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:00,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:04,403 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:07,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:07,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:07,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▎ | 224/1115 [1:23:21<4:13:54, 17.10s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▎ | 224/1115 [1:23:21<4:13:54, 17.10s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:15,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:15,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:18,640 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:22,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:22,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:25,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:25,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:29,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:29,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:32,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.3673, 'learning_rate': 0.0001338, 'epoch': 1.01} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1522, 'learning_rate': 0.0001344, 'epoch': 1.01} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7286, 'learning_rate': 0.000135, 'epoch': 1.02} + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6139, 'learning_rate': 0.0001356, 'epoch': 1.02} + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5005, 'learning_rate': 0.0001362, 'epoch': 1.03} + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4463, 'learning_rate': 0.0001368, 'epoch': 1.03} + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4191, 'learning_rate': 0.0001374, 'epoch': 1.04} + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3203, 'learning_rate': 0.000138, 'epoch': 1.04} + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4397, 'learning_rate': 0.0001386, 'epoch': 1.04} + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3437, 'learning_rate': 0.0001392, 'epoch': 1.05} + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3979, 'learning_rate': 0.00013979999999999998, 'epoch': 1.05} + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4071, 'learning_rate': 0.0001404, 'epoch': 1.06} + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3001, 'learning_rate': 0.00014099999999999998, 'epoch': 1.06} + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.36, 'learning_rate': 0.00014159999999999997, 'epoch': 1.07} + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3493, 'learning_rate': 0.0001422, 'epoch': 1.07} + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4577, 'learning_rate': 0.00014279999999999997, 'epoch': 1.08} + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2136, 'learning_rate': 0.0001434, 'epoch': 1.08} + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2935, 'learning_rate': 0.00014399999999999998, 'epoch': 1.09} + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████��███████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2193, 'learning_rate': 0.0001446, 'epoch': 1.09} + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3509, 'learning_rate': 0.00014519999999999998, 'epoch': 1.09} + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2148, 'learning_rate': 0.0001458, 'epoch': 1.1} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2056, 'learning_rate': 0.00014639999999999998, 'epoch': 1.1} +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2163, 'learning_rate': 0.000147, 'epoch': 1.11} +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2646, 'learning_rate': 0.00014759999999999998, 'epoch': 1.11} + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1475, 'learning_rate': 0.0001482, 'epoch': 1.12} + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1661, 'learning_rate': 0.00014879999999999998, 'epoch': 1.12} + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0668, 'learning_rate': 0.0001494, 'epoch': 1.13} + 23%|███████████████��█ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▏ | 252/1115 [1:35:16<5:39:18, 23.59s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▏ | 252/1115 [1:35:16<5:39:18, 23.59s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0718, 'learning_rate': 0.00015, 'epoch': 1.13} + 23%|█████████████████▏ | 252/1115 [1:35:16<5:39:18, 23.59s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▏ | 252/1115 [1:35:16<5:39:18, 23.59s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▏ | 252/1115 [1:35:16<5:39:18, 23.59s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▏ | 252/1115 [1:35:16<5:39:18, 23.59s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▏ | 252/1115 [1:35:16<5:39:18, 23.59s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1521, 'learning_rate': 0.00015059999999999997, 'epoch': 1.13} +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0864, 'learning_rate': 0.0001512, 'epoch': 1.14} +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▍ | 255/1115 [1:36:23<5:26:32, 22.78s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▍ | 255/1115 [1:36:23<5:26:32, 22.78s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0792, 'learning_rate': 0.00015179999999999998, 'epoch': 1.14} +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:17,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:17,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:17,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:17,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:25,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:25,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:29,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:29,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:33,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:33,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:33,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0559, 'learning_rate': 0.0001524, 'epoch': 1.15} +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:33,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:33,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:59,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:59,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:59,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:59,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:59,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:59,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:15:59,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:16:14,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:16:14,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 258/1115 [1:37:28<5:14:24, 22.01s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 258/1115 [1:37:28<5:14:24, 22.01s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1454, 'learning_rate': 0.0001536, 'epoch': 1.16} + 23%|█████████████████▌ | 258/1115 [1:37:28<5:14:24, 22.01s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:16:24,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:16:24,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:16:24,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:16:24,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:16:24,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:16:24,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:16:36,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▋ | 259/1115 [1:37:49<5:08:03, 21.59s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▋ | 259/1115 [1:37:49<5:08:03, 21.59s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0752, 'learning_rate': 0.00015419999999999998, 'epoch': 1.16} + 23%|█████████████████▋ | 259/1115 [1:37:49<5:08:03, 21.59s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:16:45,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:16:45,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:16:45,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:16:50,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:16:50,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:16:55,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:16:55,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▋ | 260/1115 [1:38:09<5:02:00, 21.19s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▋ | 260/1115 [1:38:09<5:02:00, 21.19s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0471, 'learning_rate': 0.0001548, 'epoch': 1.17} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:17:03,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:17:03,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:17:03,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:17:09,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:17:09,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:17:13,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:17:13,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:17:13,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:17:19,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:17:19,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1324, 'learning_rate': 0.00015539999999999998, 'epoch': 1.17} +[WARNING|modeling_utils.py:388] 2022-03-25 21:17:19,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:17:25,516 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:17:27,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:17:27,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:17:32,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:17:32,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:17:36,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:17:36,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0548, 'learning_rate': 0.000156, 'epoch': 1.17} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:17:40,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:17:40,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:17:44,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:17:44,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:17:44,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:17:50,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:17:52,397 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:17:54,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:17:54,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|█████████████████▉ | 263/1115 [1:39:07<4:40:52, 19.78s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:17:58,508 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:18:00,663 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:18:02,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:18:04,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:18:07,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:18:09,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:18:11,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:18:13,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:18:13,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1022, 'learning_rate': 0.0001572, 'epoch': 1.18} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:17,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:19,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:21,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:23,413 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:25,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:27,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:29,244 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:29,244 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:31,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:33,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:35,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:36,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:38,790 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:40,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:42,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:46,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:46,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:47,881 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:49,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:51,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:53,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:56,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:57,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:57,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:18:59,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:01,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:02,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:05,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:07,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:10,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:10,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:11,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:14,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:15,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:18,180 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:20,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:21,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:21,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:24,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:26,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:27,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:29,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:32,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:32,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:34,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:36,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:38,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:40,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:40,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:42,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:44,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:46,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:46,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:48,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:50,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:52,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:52,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:52,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:56,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:56,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:59,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:19:59,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:03,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:06,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:06,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:10,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:10,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:14,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:14,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:17,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:21,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:21,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:21,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:24,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:24,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:28,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:28,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:31,870 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:35,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:35,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:38,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:38,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:42,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:45,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:45,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:49,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:49,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.3449, 'learning_rate': 0.0001638, 'epoch': 1.23} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:53,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:53,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:57,442 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:20:57,442 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:00,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:00,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:04,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:07,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:07,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:11,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:14,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:14,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:18,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:18,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9906, 'learning_rate': 0.0001644, 'epoch': 1.24} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:21,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:21,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:24,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:28,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:28,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:31,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:35,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:35,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:38,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:38,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:41,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7723, 'learning_rate': 0.000165, 'epoch': 1.24} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5841, 'learning_rate': 0.0001656, 'epoch': 1.25} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5655, 'learning_rate': 0.0001662, 'epoch': 1.25} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5075, 'learning_rate': 0.0001668, 'epoch': 1.26} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3671, 'learning_rate': 0.0001674, 'epoch': 1.26} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4138, 'learning_rate': 0.000168, 'epoch': 1.26} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3305, 'learning_rate': 0.0001686, 'epoch': 1.27} + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2737, 'learning_rate': 0.00016919999999999997, 'epoch': 1.27} + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3387, 'learning_rate': 0.00016979999999999998, 'epoch': 1.28} + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.218, 'learning_rate': 0.00017039999999999997, 'epoch': 1.28} + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2599, 'learning_rate': 0.00017099999999999998, 'epoch': 1.29} + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2436, 'learning_rate': 0.00017159999999999997, 'epoch': 1.29} + 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|██��████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1241, 'learning_rate': 0.00017219999999999998, 'epoch': 1.3} + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2546, 'learning_rate': 0.00017279999999999997, 'epoch': 1.3} + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1435, 'learning_rate': 0.00017339999999999996, 'epoch': 1.3} + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0901, 'learning_rate': 0.00017399999999999997, 'epoch': 1.31} + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1774, 'learning_rate': 0.00017459999999999996, 'epoch': 1.31} + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1511, 'learning_rate': 0.00017519999999999998, 'epoch': 1.32} + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1222, 'learning_rate': 0.00017579999999999996, 'epoch': 1.32} +[WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0986, 'learning_rate': 0.00017639999999999998, 'epoch': 1.33} + 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0735, 'learning_rate': 0.00017699999999999997, 'epoch': 1.33} + 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:30:27,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:30:27,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:30:27,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:30:27,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:30:27,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:30:27,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0722, 'learning_rate': 0.00017759999999999998, 'epoch': 1.34} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.162, 'learning_rate': 0.00017819999999999997, 'epoch': 1.34} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1414, 'learning_rate': 0.00017879999999999998, 'epoch': 1.35} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:31:43,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:31:43,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:31:43,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:31:43,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:31:43,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|███████████��████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1157, 'learning_rate': 0.00017939999999999997, 'epoch': 1.35} + 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:12,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:12,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:12,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:12,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0798, 'learning_rate': 0.00017999999999999998, 'epoch': 1.35} +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:12,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:12,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:12,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:26,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:26,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1804, 'learning_rate': 0.00018059999999999997, 'epoch': 1.36} +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0958, 'learning_rate': 0.00018119999999999999, 'epoch': 1.36} +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0506, 'learning_rate': 0.00018179999999999997, 'epoch': 1.37} +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:33:34,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:33:34,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:33:34,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:33:34,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:33:42,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:33:42,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|██████████████████���█▊ | 306/1115 [1:54:57<5:05:16, 22.64s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▊ | 306/1115 [1:54:57<5:05:16, 22.64s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1091, 'learning_rate': 0.0001824, 'epoch': 1.37} +[WARNING|modeling_utils.py:388] 2022-03-25 21:33:50,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:33:50,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:33:50,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:33:56,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:33:56,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:33:56,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:33:56,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:33:56,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:33:56,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|████████████████████▉ | 307/1115 [1:55:19<5:03:12, 22.52s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|████████████████████▉ | 307/1115 [1:55:19<5:03:12, 22.52s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1218, 'learning_rate': 0.00018299999999999998, 'epoch': 1.38} + 28%|████████████████████▉ | 307/1115 [1:55:19<5:03:12, 22.52s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:15,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:15,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:15,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:15,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:23,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:23,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:23,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:29,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:29,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1636, 'learning_rate': 0.0001836, 'epoch': 1.38} +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:33,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:33,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:33,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:40,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:40,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:40,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:40,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:47,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:47,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:47,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.08, 'learning_rate': 0.00018419999999999998, 'epoch': 1.39} +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:47,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:47,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:58,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:58,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:34:58,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:04,423 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:04,423 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:04,423 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:10,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:10,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0701, 'learning_rate': 0.0001848, 'epoch': 1.39} +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:10,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:16,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:16,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:16,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:22,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:22,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:22,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:22,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████▏ | 311/1115 [1:56:41<4:37:29, 20.71s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████▏ | 311/1115 [1:56:41<4:37:29, 20.71s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9545, 'learning_rate': 0.00018539999999999998, 'epoch': 1.39} + 28%|█████████████████████▏ | 311/1115 [1:56:41<4:37:29, 20.71s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:35:37,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:35:37,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:41,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:41,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:35:45,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:35:45,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:49,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:49,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:51,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:51,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:51,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:57,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:35:57,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:36:01,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:36:03,755 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:36:05,979 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:36:08,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:36:08,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0669, 'learning_rate': 0.00018659999999999998, 'epoch': 1.4} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:12,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:12,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:36:15,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:36:18,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:36:18,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:21,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:23,928 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:23,928 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████▍ | 314/1115 [1:57:36<4:14:00, 19.03s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:28,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:30,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:32,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:34,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:36,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:37,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:39,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:39,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████▍ | 315/1115 [1:57:52<4:01:14, 18.09s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:36:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:43,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:45,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:47,514 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:49,304 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:52,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:54,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:54,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████▌ | 316/1115 [1:58:06<3:46:57, 17.04s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:36:56,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:58,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:56,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:36:59,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:56,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:01,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:56,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:04,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:56,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:06,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:56,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████▌ | 317/1115 [1:58:19<3:31:42, 15.92s/it] Setting `use_cache=False`...1] 2022-03-25 21:36:56,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████▌ | 317/1115 [1:58:19<3:31:42, 15.92s/it] Setting `use_cache=False`...1] 2022-03-25 21:36:56,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:11,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:09,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:12,831 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:09,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:14,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:09,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:17,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:09,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:18,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:09,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:20,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:09,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:20,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:09,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:23,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:21,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:24,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:21,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:27,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:21,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:28,501 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:21,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:30,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:21,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|█████████████████████▋ | 319/1115 [1:58:43<3:02:46, 13.78s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:37:33,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|█████████████████████▋ | 319/1115 [1:58:43<3:02:46, 13.78s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:37:33,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:34,493 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:33,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:36,947 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:33,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:39,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:33,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:40,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:33,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|█████████████████████▊ | 320/1115 [1:58:53<2:46:26, 12.56s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:37:42,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|█████████████████████▊ | 320/1115 [1:58:53<2:46:26, 12.56s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:37:42,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:45,037 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:42,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:47,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:42,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:49,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:42,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|█████████████████████▉ | 321/1115 [1:59:01<2:30:08, 11.35s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:37:51,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|█████████████████████▉ | 321/1115 [1:59:01<2:30:08, 11.35s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:37:51,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:53,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:51,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:55,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:51,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:57,720 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:51,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:57,720 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:51,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:37:59,487 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:58,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:01,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:58,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:03,316 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:58,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:03,316 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:58,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████ | 323/1115 [1:59:15<1:58:27, 8.97s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████ | 323/1115 [1:59:15<1:58:27, 8.97s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████ | 323/1115 [1:59:15<1:58:27, 8.97s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:09,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:13,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:13,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:17,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:17,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:20,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:24,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:24,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:27,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:27,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:31,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████ | 324/1115 [1:59:44<3:18:14, 15.04s/it] Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████ | 324/1115 [1:59:44<3:18:14, 15.04s/it] Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████ | 324/1115 [1:59:44<3:18:14, 15.04s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████ | 324/1115 [1:59:44<3:18:14, 15.04s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:38,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:42,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:42,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:45,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:45,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:49,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:52,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:52,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:55,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:55,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:59,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:59,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:38:59,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▏ | 325/1115 [2:00:13<4:13:33, 19.26s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▏ | 325/1115 [2:00:13<4:13:33, 19.26s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:07,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:11,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:11,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:14,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:14,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:17,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:21,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:21,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:24,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:24,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:28,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▏ | 326/1115 [2:00:41<4:47:04, 21.83s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▏ | 326/1115 [2:00:41<4:47:04, 21.83s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▏ | 326/1115 [2:00:41<4:47:04, 21.83s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:35,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:35,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:38,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:41,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:41,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:45,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:45,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:48,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:52,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:52,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:39:55,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9718, 'learning_rate': 0.000195, 'epoch': 1.47} + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8634, 'learning_rate': 0.00019559999999999998, 'epoch': 1.47} + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|█████████████████████��▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2309, 'learning_rate': 0.00019559999999999998, 'epoch': 1.48} + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1864, 'learning_rate': 0.0001962, 'epoch': 1.48} + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|████████��█████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 7.7546, 'learning_rate': 0.00019679999999999999, 'epoch': 1.48} + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 7.6626, 'learning_rate': 0.0001974, 'epoch': 1.49} + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 7.7001, 'learning_rate': 0.000198, 'epoch': 1.49} + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 7.4637, 'learning_rate': 0.0001986, 'epoch': 1.5} + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 7.272, 'learning_rate': 0.0001992, 'epoch': 1.5} + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 7.0608, 'learning_rate': 0.0001998, 'epoch': 1.51} + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.811, 'learning_rate': 0.0002004, 'epoch': 1.51} + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.8756, 'learning_rate': 0.000201, 'epoch': 1.52} + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.0821, 'learning_rate': 0.0002016, 'epoch': 1.52} + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2635, 'learning_rate': 0.0002022, 'epoch': 1.52} + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1346, 'learning_rate': 0.0002028, 'epoch': 1.53} + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9469, 'learning_rate': 0.00020339999999999998, 'epoch': 1.53} + 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7854, 'learning_rate': 0.000204, 'epoch': 1.54} + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4968, 'learning_rate': 0.00020459999999999999, 'epoch': 1.54} + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5434, 'learning_rate': 0.0002052, 'epoch': 1.55} + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4593, 'learning_rate': 0.0002058, 'epoch': 1.55} + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4982, 'learning_rate': 0.00020639999999999998, 'epoch': 1.56} + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|██████████████████████��▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5314, 'learning_rate': 0.00020699999999999996, 'epoch': 1.56} + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███���███████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4604, 'learning_rate': 0.00020759999999999998, 'epoch': 1.57} + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3669, 'learning_rate': 0.00020819999999999996, 'epoch': 1.57} + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2684, 'learning_rate': 0.00020879999999999998, 'epoch': 1.57} + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3108, 'learning_rate': 0.00020939999999999997, 'epoch': 1.58} +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3765, 'learning_rate': 0.00020999999999999998, 'epoch': 1.58} +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5628, 'learning_rate': 0.00021059999999999997, 'epoch': 1.59} +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3952, 'learning_rate': 0.00021119999999999996, 'epoch': 1.59} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:51:42,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:51:42,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:51:42,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:51:42,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:51:42,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:51:42,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:51:42,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:51:56,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:51:56,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:00,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:00,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2404, 'learning_rate': 0.00021179999999999997, 'epoch': 1.6} +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:04,703 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:04,703 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:04,703 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:10,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:10,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:14,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:14,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:14,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:14,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▎ | 357/1115 [2:13:33<4:44:46, 22.54s/it]g-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▎ | 357/1115 [2:13:33<4:44:46, 22.54s/it]g-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1993, 'learning_rate': 0.00021239999999999996, 'epoch': 1.6} + 32%|████████████████████████▎ | 357/1115 [2:13:33<4:44:46, 22.54s/it]g-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:29,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:29,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:29,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:29,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:29,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:39,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:39,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▍ | 358/1115 [2:13:54<4:38:15, 22.06s/it]g-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▍ | 358/1115 [2:13:54<4:38:15, 22.06s/it]g-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.24, 'learning_rate': 0.00021299999999999997, 'epoch': 1.61} + 32%|████████████████████████▍ | 358/1115 [2:13:54<4:38:15, 22.06s/it]g-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:50,213 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:50,213 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:54,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:54,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:54,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:52:54,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:53:02,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:53:02,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:53:02,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:53:06,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:53:06,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:10,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:10,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:10,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:10,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:53:19,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:53:19,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:53:19,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▌ | 360/1115 [2:14:35<4:26:32, 21.18s/it] Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▌ | 360/1115 [2:14:35<4:26:32, 21.18s/it] Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:27,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:27,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:27,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:27,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:27,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:36,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:36,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:36,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:42,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:42,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2105, 'learning_rate': 0.00021479999999999996, 'epoch': 1.62} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:53:47,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:53:47,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:51,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:51,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:51,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:57,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:59,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:53:59,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▋ | 362/1115 [2:15:13<4:13:27, 20.20s/it]g-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|███��████████████████████▋ | 362/1115 [2:15:13<4:13:27, 20.20s/it]g-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:54:05,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:54:07,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:54:07,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:11,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:11,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:16,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:16,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:54:19,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:54:22,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:54:22,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1486, 'learning_rate': 0.00021599999999999996, 'epoch': 1.63} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:26,286 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:28,413 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:30,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:32,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:34,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:36,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:36,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:54:40,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 21:54:40,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0632, 'learning_rate': 0.00021659999999999998, 'epoch': 1.63} +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:44,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:46,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:48,128 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:50,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:52,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:53,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1115 [2:16:06<3:46:30, 18.12s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:54:55,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1115 [2:16:06<3:46:30, 18.12s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:54:55,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:57,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:54:55,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:54:59,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:54:55,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:01,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:54:55,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:03,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:54:55,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:05,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:54:55,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:07,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:54:55,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 366/1115 [2:16:20<3:33:45, 17.12s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:55:10,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 366/1115 [2:16:20<3:33:45, 17.12s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:55:10,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:12,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:10,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:14,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:10,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:15,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:10,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:17,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:10,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:20,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:10,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:22,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:10,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████ | 367/1115 [2:16:34<3:19:42, 16.02s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:55:24,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████ | 367/1115 [2:16:34<3:19:42, 16.02s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:55:24,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:27,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:24,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:28,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:24,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:30,287 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:24,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:31,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:24,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:34,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:24,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:34,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:24,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████ | 368/1115 [2:16:46<3:05:53, 14.93s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:55:36,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:37,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:36,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:40,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:36,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:41,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:36,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:43,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:36,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 369/1115 [2:16:58<2:53:23, 13.95s/it] Setting `use_cache=False`...1] 2022-03-25 21:55:36,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 369/1115 [2:16:58<2:53:23, 13.95s/it] Setting `use_cache=False`...1] 2022-03-25 21:55:36,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:49,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:48,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:50,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:48,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:52,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:48,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:55,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:48,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:55,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:48,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 370/1115 [2:17:07<2:37:05, 12.65s/it] Setting `use_cache=False`...1] 2022-03-25 21:55:48,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:55:58,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:57,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:00,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:57,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:02,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:57,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:02,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:57,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▎ | 371/1115 [2:17:16<2:20:07, 11.30s/it] Setting `use_cache=False`...1] 2022-03-25 21:55:57,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:07,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:05,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:09,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:05,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:11,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:05,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:11,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:05,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:13,628 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:12,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:15,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:12,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:17,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:12,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:17,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:12,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 373/1115 [2:17:29<1:49:44, 8.87s/it] Setting `use_cache=False`...1] 2022-03-25 21:56:12,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 373/1115 [2:17:29<1:49:44, 8.87s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:23,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:23,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:27,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:27,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:30,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:30,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:34,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:37,918 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:37,918 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:41,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:41,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:45,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:45,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▍ | 374/1115 [2:17:58<3:03:33, 14.86s/it] Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▍ | 374/1115 [2:17:58<3:03:33, 14.86s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:52,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:52,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:55,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:55,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:56:59,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:02,779 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:02,779 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:06,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:06,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:09,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:09,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:13,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 375/1115 [2:18:27<3:55:51, 19.12s/it] Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 375/1115 [2:18:27<3:55:51, 19.12s/it] Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 375/1115 [2:18:27<3:55:51, 19.12s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:21,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:21,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:24,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:24,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:28,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:31,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:31,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:35,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:35,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:38,471 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:41,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:41,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:41,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 376/1115 [2:18:55<4:27:15, 21.70s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 376/1115 [2:18:55<4:27:15, 21.70s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:48,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:48,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:52,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:55,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:55,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:57:58,881 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:58:02,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:58:02,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:58:05,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:58:05,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 21:58:08,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.525, 'learning_rate': 0.0002238, 'epoch': 1.69} + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4688, 'learning_rate': 0.00022439999999999998, 'epoch': 1.7} + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|███████���█████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3824, 'learning_rate': 0.000225, 'epoch': 1.7} + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4258, 'learning_rate': 0.00022559999999999998, 'epoch': 1.7} + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4924, 'learning_rate': 0.00022619999999999997, 'epoch': 1.71} + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|██████████████���██████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3423, 'learning_rate': 0.00022679999999999998, 'epoch': 1.71} + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.439, 'learning_rate': 0.00022739999999999997, 'epoch': 1.72} + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2055, 'learning_rate': 0.00022799999999999999, 'epoch': 1.72} + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|���████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1395, 'learning_rate': 0.00022859999999999997, 'epoch': 1.73} + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1919, 'learning_rate': 0.0002292, 'epoch': 1.73} + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1744, 'learning_rate': 0.00022979999999999997, 'epoch': 1.74} + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|██████���██████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1852, 'learning_rate': 0.0002304, 'epoch': 1.74} + 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1599, 'learning_rate': 0.00023099999999999998, 'epoch': 1.74} + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|████████████████████���█████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1255, 'learning_rate': 0.0002316, 'epoch': 1.75} + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1317, 'learning_rate': 0.00023219999999999998, 'epoch': 1.75} + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1679, 'learning_rate': 0.0002328, 'epoch': 1.76} + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1555, 'learning_rate': 0.00023339999999999998, 'epoch': 1.76} + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▊ | 394/1115 [2:26:43<5:03:15, 25.24s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▊ | 394/1115 [2:26:43<5:03:15, 25.24s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1475, 'learning_rate': 0.000234, 'epoch': 1.77} + 35%|██████████████████████████▊ | 394/1115 [2:26:43<5:03:15, 25.24s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▊ | 394/1115 [2:26:43<5:03:15, 25.24s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▊ | 394/1115 [2:26:43<5:03:15, 25.24s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▊ | 394/1115 [2:26:43<5:03:15, 25.24s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:05:44,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:05:44,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:05:48,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:05:48,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:05:48,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:05:48,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1145, 'learning_rate': 0.00023459999999999998, 'epoch': 1.77} + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0143, 'learning_rate': 0.0002352, 'epoch': 1.78} + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1586, 'learning_rate': 0.00023579999999999999, 'epoch': 1.78} + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 35%|█████���████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0736, 'learning_rate': 0.0002364, 'epoch': 1.78} + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1268, 'learning_rate': 0.000237, 'epoch': 1.79} + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1166, 'learning_rate': 0.0002376, 'epoch': 1.79} +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0384, 'learning_rate': 0.0002382, 'epoch': 1.8} +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▍ | 402/1115 [2:29:53<4:41:12, 23.66s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▍ | 402/1115 [2:29:53<4:41:12, 23.66s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9957, 'learning_rate': 0.0002388, 'epoch': 1.8} +[WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0793, 'learning_rate': 0.0002394, 'epoch': 1.81} +[WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1288, 'learning_rate': 0.00023999999999999998, 'epoch': 1.81} +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9214, 'learning_rate': 0.0002406, 'epoch': 1.82} +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0041, 'learning_rate': 0.00024119999999999998, 'epoch': 1.82} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:10:29,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:10:29,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:10:29,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|███████████████████████████▋ | 407/1115 [2:31:46<4:26:14, 22.56s/it] Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|███████████████████████████▋ | 407/1115 [2:31:46<4:26:14, 22.56s/it] Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0388, 'learning_rate': 0.0002418, 'epoch': 1.83} + 37%|███████████████████████████▋ | 407/1115 [2:31:46<4:26:14, 22.56s/it] Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|███████████████████████████��� | 407/1115 [2:31:46<4:26:14, 22.56s/it] Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|███████████████████████████▋ | 407/1115 [2:31:46<4:26:14, 22.56s/it] Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:10:45,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:10:45,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:10:45,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:10:45,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:10:53,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:10:53,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:10:53,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:10:53,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.989, 'learning_rate': 0.00024239999999999998, 'epoch': 1.83} +[WARNING|modeling_utils.py:388] 2022-03-25 22:10:53,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:10:53,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:10:53,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:08,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:08,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:08,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:14,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:14,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:14,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0942, 'learning_rate': 0.000243, 'epoch': 1.83} +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:14,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:14,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:14,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:14,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:28,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:28,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:28,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:28,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:11:36,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:11:36,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9554, 'learning_rate': 0.00024359999999999999, 'epoch': 1.84} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:11:36,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:11:43,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:11:43,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:11:46,853 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:11:46,853 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:51,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:51,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:51,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:57,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:57,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0312, 'learning_rate': 0.00024419999999999997, 'epoch': 1.84} +[WARNING|modeling_utils.py:388] 2022-03-25 22:11:57,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:03,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:05,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:05,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:05,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:11,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:11,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:12:15,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:12:15,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9571, 'learning_rate': 0.0002448, 'epoch': 1.85} +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:19,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:19,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:12:23,852 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:12:23,852 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:12:28,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:12:28,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:31,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:34,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:34,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:34,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:12:38,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:12:40,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:12:42,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:12:44,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:12:44,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:48,425 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:50,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:52,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:52,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:54,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:56,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:12:58,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:00,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:02,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:04,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:06,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:06,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:08,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:10,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:12,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:14,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:16,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:17,937 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:19,733 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:23,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:23,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:25,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:26,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:28,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:30,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:32,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:35,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:35,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:37,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:38,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:40,305 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:43,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:44,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:46,463 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:49,401 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:49,401 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:50,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:52,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:55,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:56,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:13:58,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:01,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:01,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:02,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:05,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:07,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:09,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:09,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:11,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:13,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:15,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:17,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:17,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:19,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:22,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:23,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:25,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:25,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:28,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:29,798 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:31,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:31,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:32,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:32,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:36,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:40,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:40,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:43,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:43,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:47,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:47,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:50,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:54,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:54,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:57,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:57,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:14:57,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:01,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:04,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:04,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:08,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:08,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:11,565 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:11,565 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:15,042 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:18,449 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:18,449 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:21,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:21,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:25,268 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:25,268 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:29,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:29,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:33,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:33,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:36,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:36,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:39,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:43,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:43,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:46,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:49,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:49,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:53,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:53,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:53,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:15:56,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:00,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:00,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:03,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:03,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:06,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:10,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:10,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:13,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:16,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:16,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:19,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:19,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:19,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:23,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5779, 'learning_rate': 0.00025439999999999995, 'epoch': 1.92} + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|████████████████��████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5219, 'learning_rate': 0.00025499999999999996, 'epoch': 1.92} + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2696, 'learning_rate': 0.0002556, 'epoch': 1.93} + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1901, 'learning_rate': 0.0002562, 'epoch': 1.93} + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1369, 'learning_rate': 0.00025679999999999995, 'epoch': 1.94} + 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1342, 'learning_rate': 0.000258, 'epoch': 1.95} + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1208, 'learning_rate': 0.0002586, 'epoch': 1.95} + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0492, 'learning_rate': 0.00025979999999999997, 'epoch': 1.96} + 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▊ | 438/1115 [2:42:04<4:25:05, 23.49s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 39%|█████████████████████████████▊ | 438/1115 [2:42:04<4:25:05, 23.49s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:20:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:13,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:13,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9223, 'learning_rate': 0.000261, 'epoch': 1.97} +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:13,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:13,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:13,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:24,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:24,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:24,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:24,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:24,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:34,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:34,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9732, 'learning_rate': 0.00026159999999999996, 'epoch': 1.97} +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:34,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:34,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:21:42,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:21:42,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:21:42,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:21:42,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:50,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:50,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:21:54,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:21:54,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9046, 'learning_rate': 0.0002622, 'epoch': 1.98} +[WARNING|modeling_utils.py:388] 2022-03-25 22:21:58,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:22:00,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:22:03,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:22:05,213 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:22:05,213 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:11,050 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▏ | 442/1115 [2:43:23<3:46:14, 20.17s/it][WARNING|modeling_bart.py:1051] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▏ | 442/1115 [2:43:23<3:46:14, 20.17s/it][WARNING|modeling_bart.py:1051] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:15,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:17,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:19,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:20,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:22,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:24,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▏ | 443/1115 [2:43:38<3:28:17, 18.60s/it] Setting `use_cache=False`...1] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▏ | 443/1115 [2:43:38<3:28:17, 18.60s/it] Setting `use_cache=False`...1] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:29,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:28,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:31,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:28,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:32,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:28,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:35,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:28,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:35,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:28,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:39,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:28,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:39,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:28,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▎ | 444/1115 [2:43:51<3:08:57, 16.90s/it][WARNING|modeling_bart.py:1051] 2022-03-25 22:22:40,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:43,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:40,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:45,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:40,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:47,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:40,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▎ | 445/1115 [2:44:00<2:43:23, 14.63s/it][WARNING|modeling_bart.py:1051] 2022-03-25 22:22:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▎ | 445/1115 [2:44:00<2:43:23, 14.63s/it][WARNING|modeling_bart.py:1051] 2022-03-25 22:22:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:51,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:53,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:56,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:22:56,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▍ | 446/1115 [2:44:07<2:17:02, 12.29s/it] Setting `use_cache=False`...1] 2022-03-25 22:22:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▍ | 446/1115 [2:44:07<2:17:02, 12.29s/it][WARNING|modeling_bart.py:1051] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:01,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:01,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:05,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:05,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:09,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:09,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:12,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:16,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:16,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:19,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:19,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:23,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▍ | 447/1115 [2:44:36<3:13:05, 17.34s/it] Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▍ | 447/1115 [2:44:36<3:13:05, 17.34s/it] Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▍ | 447/1115 [2:44:36<3:13:05, 17.34s/it][WARNING|modeling_bart.py:1051] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:30,373 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:30,373 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:33,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:33,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:37,333 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:40,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:40,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2469, 'learning_rate': 0.00026639999999999997, 'epoch': 2.01} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9693, 'learning_rate': 0.000267, 'epoch': 2.01} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5121, 'learning_rate': 0.0002676, 'epoch': 2.02} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5022, 'learning_rate': 0.00026819999999999996, 'epoch': 2.02} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2283, 'learning_rate': 0.0002688, 'epoch': 2.03} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0816, 'learning_rate': 0.0002694, 'epoch': 2.03} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.091, 'learning_rate': 0.00027, 'epoch': 2.04} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0024, 'learning_rate': 0.00027059999999999996, 'epoch': 2.04} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9794, 'learning_rate': 0.0002712, 'epoch': 2.04} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8077, 'learning_rate': 0.0002718, 'epoch': 2.05} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.86, 'learning_rate': 0.0002724, 'epoch': 2.05} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6727, 'learning_rate': 0.00027299999999999997, 'epoch': 2.06} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7923, 'learning_rate': 0.0002736, 'epoch': 2.06} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7753, 'learning_rate': 0.0002742, 'epoch': 2.07} + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7465, 'learning_rate': 0.0002748, 'epoch': 2.07} + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.712, 'learning_rate': 0.00027539999999999997, 'epoch': 2.08} + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5746, 'learning_rate': 0.000276, 'epoch': 2.08} + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6256, 'learning_rate': 0.0002766, 'epoch': 2.09} + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5238, 'learning_rate': 0.0002772, 'epoch': 2.09} + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|██████████████████���████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4411, 'learning_rate': 0.0002778, 'epoch': 2.09} + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5065, 'learning_rate': 0.0002784, 'epoch': 2.1} + 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5197, 'learning_rate': 0.000279, 'epoch': 2.1} +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4927, 'learning_rate': 0.00027959999999999997, 'epoch': 2.11} +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5846, 'learning_rate': 0.0002802, 'epoch': 2.11} +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4224, 'learning_rate': 0.0002808, 'epoch': 2.12} + g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4171, 'learning_rate': 0.00028139999999999996, 'epoch': 2.12} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3167, 'learning_rate': 0.00028199999999999997, 'epoch': 2.13} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4456, 'learning_rate': 0.0002826, 'epoch': 2.13} +[WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:34,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:34,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4501, 'learning_rate': 0.00028319999999999994, 'epoch': 2.13} +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3677, 'learning_rate': 0.00028379999999999996, 'epoch': 2.14} +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:36:26,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:36:26,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2927, 'learning_rate': 0.0002844, 'epoch': 2.14} +[WARNING|modeling_utils.py:388] 2022-03-25 22:36:26,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:36:26,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:36:26,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:36:26,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:36:26,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:36:26,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:36:42,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:36:42,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:36:42,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:36:42,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3387, 'learning_rate': 0.000285, 'epoch': 2.15} +[WARNING|modeling_utils.py:388] 2022-03-25 22:36:42,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:36:42,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:36:42,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:36:42,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:36:59,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:36:59,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:36:59,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:04,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:04,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▋ | 480/1115 [2:58:19<3:52:44, 21.99s/it]g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▋ | 480/1115 [2:58:19<3:52:44, 21.99s/it]g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3384, 'learning_rate': 0.00028559999999999995, 'epoch': 2.15} + 43%|████████████████████████████████▋ | 480/1115 [2:58:19<3:52:44, 21.99s/it]g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▋ | 480/1115 [2:58:19<3:52:44, 21.99s/it]g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▋ | 480/1115 [2:58:19<3:52:44, 21.99s/it]g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:19,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:19,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:19,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:19,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:19,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2779, 'learning_rate': 0.00028619999999999996, 'epoch': 2.16} +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:49,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:49,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:49,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2192, 'learning_rate': 0.0002868, 'epoch': 2.16} +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:55,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:55,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:37:55,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:02,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:02,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:02,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:08,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:08,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:08,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:08,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:14,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:14,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:38:18,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:38:18,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:22,968 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:22,968 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:22,968 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:28,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:28,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.178, 'learning_rate': 0.00028799999999999995, 'epoch': 2.17} +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:28,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:34,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:37,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:37,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:38:41,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:38:41,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:38:41,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:38:47,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:38:47,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|█████████████████████████████████ | 485/1115 [2:59:59<3:29:28, 19.95s/it] Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:51,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:53,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:38:53,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:38:57,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:38:59,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:38:59,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:03,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:03,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:03,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 44%|█████████████████████████████████▏ | 486/1115 [3:00:17<3:23:07, 19.38s/it][WARNING|modeling_bart.py:1051] 2022-03-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:39:09,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-25 22:39:09,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:13,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:15,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:17,280 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:19,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:21,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:23,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:23,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:25,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:27,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:29,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:32,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:34,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:36,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:38,078 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:39,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:39,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:41,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:43,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:45,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:47,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:49,196 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:52,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:54,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:54,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:56,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:58,133 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:39:59,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:03,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:04,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:06,397 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:08,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:08,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:09,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:12,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:14,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:15,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:18,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:20,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:20,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:23,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:25,837 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:27,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:29,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:31,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:31,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:32,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:34,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:37,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:39,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:40,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:40,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:43,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:45,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:46,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:48,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:50,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:50,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:53,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:55,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:57,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:57,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:40:59,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:01,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:03,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:03,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1926, 'learning_rate': 0.00029519999999999997, 'epoch': 2.22} +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:07,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:07,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:11,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:11,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:14,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:14,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:18,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:21,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:21,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:25,268 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:25,268 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:28,814 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:28,814 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:32,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:32,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:35,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:35,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:39,478 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:39,478 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:42,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:46,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:46,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:49,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:49,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:53,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:56,853 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:56,853 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:41:56,853 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:00,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:00,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:03,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:07,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:07,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:10,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:10,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:14,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:14,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:17,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:21,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:21,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:24,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:24,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:24,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:28,023 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:31,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:31,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:34,916 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:38,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:38,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:41,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:41,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:45,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:45,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:48,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:51,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-25 22:42:51,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3539, 'learning_rate': 0.00029759999999999997, 'epoch': 2.24} +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +03/25/2022 22:52:50 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow +{'eval_loss': 4.050220966339111, 'eval_wer': 1.7867314557715193, 'eval_runtime': 594.0081, 'eval_samples_per_second': 4.448, 'eval_steps_per_second': 0.557, 'epoch': 2.24} +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...