diff --git "a/wandb/run-20220322_163235-2yj5gh94/files/output.log" "b/wandb/run-20220322_163235-2yj5gh94/files/output.log" new file mode 100644--- /dev/null +++ "b/wandb/run-20220322_163235-2yj5gh94/files/output.log" @@ -0,0 +1,3610 @@ + + 0%| | 0/2230 [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:32:39,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:32:40,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:32:41,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:32:42,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:32:42,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:32:44,149 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:32:44,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:32:45,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:32:46,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:32:47,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:32:48,338 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:32:49,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:32:50,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 10.1925, 'learning_rate': 0.0, 'epoch': 0.0} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:32:51,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:32:51,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%| | 1/2230 [00:15<9:29:51, 15.34s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:32:53,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:32:53,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:32:54,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:32:55,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:32:56,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:32:57,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:32:58,314 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:32:58,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:00,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:00,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:01,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:02,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:03,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:04,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 10.3094, 'learning_rate': 0.0, 'epoch': 0.0} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:05,304 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:05,945 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%| | 2/2230 [00:29<9:02:01, 14.60s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:33:07,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:07,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:08,850 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:09,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:10,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:11,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:12,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:12,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:14,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:14,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:15,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:16,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:17,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:18,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 10.0606, 'learning_rate': 6e-07, 'epoch': 0.01} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:19,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:19,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%| | 3/2230 [00:43<8:56:35, 14.46s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:33:21,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:22,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:23,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:23,755 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:24,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:25,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:26,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:27,228 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:28,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:28,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:30,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:30,653 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:31,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:32,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 10.1729, 'learning_rate': 6e-07, 'epoch': 0.01} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:33,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:34,073 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 4/2230 [00:57<8:47:10, 14.21s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:33:35,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:35,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:36,920 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:37,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:38,622 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:39,224 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:40,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:40,945 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:42,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:42,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:43,696 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:44,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:45,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:45,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:47,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:47,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 5/2230 [01:11<8:38:36, 13.99s/it] + 0%|▏ | 5/2230 [01:11<8:38:36, 13.99s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:33:48,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:49,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:50,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:51,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:52,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:52,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:53,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:54,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:55,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:56,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:57,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:57,837 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:33:58,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:33:59,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:00,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:01,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 6/2230 [01:24<8:33:23, 13.85s/it] + 0%|▏ | 6/2230 [01:24<8:33:23, 13.85s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:34:02,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:02,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:04,116 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:04,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:05,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:06,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:07,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:08,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:09,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:09,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:10,831 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:11,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:12,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:13,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:14,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 10.1223, 'learning_rate': 2.4e-06, 'epoch': 0.02} +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:14,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 7/2230 [01:38<8:29:22, 13.75s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:34:15,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:16,491 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:17,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:18,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:19,220 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:19,853 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:20,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:21,491 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:22,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:23,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:24,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:24,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:25,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:26,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:27,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:28,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 8/2230 [01:51<8:24:56, 13.63s/it] + 0%|▎ | 8/2230 [01:51<8:24:56, 13.63s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:34:29,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:29,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:30,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:31,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:32,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:33,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:34,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:34,857 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:35,909 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:36,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:37,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:38,134 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:39,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:39,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.0957, 'learning_rate': 3.6e-06, 'epoch': 0.02} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:40,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:41,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 9/2230 [02:04<8:20:51, 13.53s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:34:42,579 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:43,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:44,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:44,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:45,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:46,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:47,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:48,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:49,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:49,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:50,734 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:51,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:52,382 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:52,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:54,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:54,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 10/2230 [02:18<8:16:19, 13.41s/it] + 0%|▎ | 10/2230 [02:18<8:16:19, 13.41s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:34:55,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:56,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:57,336 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:57,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:34:58,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:34:59,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:00,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:01,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:02,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:02,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:03,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:04,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:05,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:05,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:07,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:07,589 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.4451, 'learning_rate': 4.8e-06, 'epoch': 0.02} + 0%|▍ | 11/2230 [02:31<8:11:15, 13.28s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:35:08,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:09,280 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:10,322 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:10,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:11,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:12,496 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:13,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:14,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:15,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:15,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:16,719 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:17,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:18,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:18,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:19,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:20,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 12/2230 [02:44<8:06:54, 13.17s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:35:21,609 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 7.9992, 'learning_rate': 5.399999999999999e-06, 'epoch': 0.03} +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:22,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:23,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:23,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:24,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:25,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:26,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:28,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:30,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:30,598 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:31,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:32,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:33,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:33,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:34,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:35,325 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 13/2230 [02:58<8:25:07, 13.67s/it] + 1%|▍ | 13/2230 [02:58<8:25:07, 13.67s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:35:36,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:37,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:38,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:38,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:39,724 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:40,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:41,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:41,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:42,941 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:43,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:44,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:45,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:46,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:46,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:47,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:48,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 14/2230 [03:11<8:16:19, 13.44s/it] + 1%|▍ | 14/2230 [03:11<8:16:19, 13.44s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:35:49,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:35:49,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 14/2230 [03:11<8:16:19, 13.44s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:35:49,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:52,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:35:49,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:55,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:35:49,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:55,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:35:49,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:58,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:35:49,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:35:58,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:35:49,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▌ | 15/2230 [03:24<8:08:22, 13.23s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:36:02,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▌ | 15/2230 [03:24<8:08:22, 13.23s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:36:02,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:05,160 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:02,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:08,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:02,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:08,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:02,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:11,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:02,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:11,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:02,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▌ | 16/2230 [03:37<8:01:42, 13.05s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:36:14,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▌ | 16/2230 [03:37<8:01:42, 13.05s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:36:14,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:17,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:14,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:17,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:14,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:20,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:14,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:24,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:14,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:24,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:14,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▌ | 17/2230 [03:49<7:55:41, 12.90s/it] Setting `use_cache=False`...1] 2022-03-22 16:36:14,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▌ | 17/2230 [03:49<7:55:41, 12.90s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:36:27,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:30,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:27,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:30,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:27,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:33,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:27,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:36,365 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:27,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:36,365 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:27,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▋ | 18/2230 [04:01<7:49:06, 12.72s/it] Setting `use_cache=False`...1] 2022-03-22 16:36:27,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▋ | 18/2230 [04:01<7:49:06, 12.72s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:36:39,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:42,620 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:39,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:42,620 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:39,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:45,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:39,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:48,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:39,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:48,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:39,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▋ | 19/2230 [04:14<7:44:57, 12.62s/it] Setting `use_cache=False`...1] 2022-03-22 16:36:39,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▋ | 19/2230 [04:14<7:44:57, 12.62s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:36:51,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:54,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:51,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:54,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:51,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:36:58,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:51,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:01,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:51,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:01,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:36:51,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▋ | 20/2230 [04:26<7:41:28, 12.53s/it] Setting `use_cache=False`...1] 2022-03-22 16:36:51,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▋ | 20/2230 [04:26<7:41:28, 12.53s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:37:04,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:07,301 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:04,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:07,301 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:04,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:10,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:04,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:13,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:04,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▋ | 21/2230 [04:39<7:39:23, 12.48s/it] Setting `use_cache=False`...1] 2022-03-22 16:37:04,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▋ | 21/2230 [04:39<7:39:23, 12.48s/it] Setting `use_cache=False`...1] 2022-03-22 16:37:04,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▋ | 21/2230 [04:39<7:39:23, 12.48s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:37:16,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:19,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:16,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:19,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:16,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:22,716 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:16,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:25,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:16,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 22/2230 [04:51<7:36:17, 12.40s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:37:28,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 22/2230 [04:51<7:36:17, 12.40s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:37:28,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.0982, 'learning_rate': 1.14e-05, 'epoch': 0.05} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:31,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:28,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:31,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:28,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:34,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:28,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:37,825 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:28,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [05:03<7:32:48, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 16:37:28,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [05:03<7:32:48, 12.31s/it] Setting `use_cache=False`...1] 2022-03-22 16:37:28,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 23/2230 [05:03<7:32:48, 12.31s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:37:40,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:43,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:40,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:46,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:40,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:46,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:40,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:49,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:40,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 24/2230 [05:15<7:29:51, 12.24s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:37:52,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 24/2230 [05:15<7:29:51, 12.24s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:37:52,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.8474, 'learning_rate': 1.26e-05, 'epoch': 0.05} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:55,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:52,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:58,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:52,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:37:58,876 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:52,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:38:01,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:52,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:38:01,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:37:52,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▉ | 25/2230 [05:29<7:47:26, 12.72s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▉ | 25/2230 [05:29<7:47:26, 12.72s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.5599, 'learning_rate': 1.3199999999999997e-05, 'epoch': 0.06} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:38:09,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:38:12,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:38:12,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:38:12,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:38:12,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.7598, 'learning_rate': 1.3799999999999998e-05, 'epoch': 0.06} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:38:12,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:38:12,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:38:12,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:38:12,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▉ | 27/2230 [05:52<7:30:59, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▉ | 27/2230 [05:52<7:30:59, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.5277, 'learning_rate': 1.44e-05, 'epoch': 0.06} + 1%|▉ | 27/2230 [05:52<7:30:59, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▉ | 27/2230 [05:52<7:30:59, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▉ | 27/2230 [05:52<7:30:59, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▉ | 27/2230 [05:52<7:30:59, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▉ | 27/2230 [05:52<7:30:59, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2876, 'learning_rate': 1.4999999999999999e-05, 'epoch': 0.06} + 1%|▉ | 27/2230 [05:52<7:30:59, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▉ | 27/2230 [05:52<7:30:59, 12.28s/it] Setting `use_cache=False`...1] 2022-03-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:38:48,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:38:48,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:38:48,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:38:48,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1947, 'learning_rate': 1.5599999999999996e-05, 'epoch': 0.07} +[WARNING|modeling_utils.py:388] 2022-03-22 16:38:48,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:38:48,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:38:48,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:38:48,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:38:48,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [06:27<7:14:02, 11.84s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [06:27<7:14:02, 11.84s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [06:27<7:14:02, 11.84s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [06:27<7:14:02, 11.84s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [06:27<7:14:02, 11.84s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [06:27<7:14:02, 11.84s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [06:27<7:14:02, 11.84s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2972, 'learning_rate': 1.68e-05, 'epoch': 0.07} + 1%|█ | 30/2230 [06:27<7:14:02, 11.84s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [06:27<7:14:02, 11.84s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [06:27<7:14:02, 11.84s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 30/2230 [06:27<7:14:02, 11.84s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█▏ | 32/2230 [06:50<7:02:50, 11.54s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█▏ | 32/2230 [06:50<7:02:50, 11.54s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.059, 'learning_rate': 1.74e-05, 'epoch': 0.07} + 1%|█▏ | 32/2230 [06:50<7:02:50, 11.54s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█▏ | 32/2230 [06:50<7:02:50, 11.54s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█▏ | 32/2230 [06:50<7:02:50, 11.54s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█▏ | 32/2230 [06:50<7:02:50, 11.54s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█▏ | 32/2230 [06:50<7:02:50, 11.54s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1491, 'learning_rate': 1.7999999999999997e-05, 'epoch': 0.07} + 1%|█▏ | 32/2230 [06:50<7:02:50, 11.54s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█▏ | 32/2230 [06:50<7:02:50, 11.54s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█▏ | 32/2230 [06:50<7:02:50, 11.54s/it]g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:39:47,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:39:47,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.0233, 'learning_rate': 1.8599999999999998e-05, 'epoch': 0.08} +[WARNING|modeling_utils.py:388] 2022-03-22 16:39:47,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:39:47,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:39:47,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:39:57,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:39:57,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:39:57,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:40:01,798 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:40:01,798 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:40:01,798 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:40:01,798 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9906, 'learning_rate': 1.98e-05, 'epoch': 0.08} + g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:40:17,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▎ | 37/2230 [07:43<6:27:40, 10.61s/it] Setting `use_cache=False`...e computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▎ | 37/2230 [07:43<6:27:40, 10.61s/it] Setting `use_cache=False`...e computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.091, 'learning_rate': 2.04e-05, 'epoch': 0.08} + 2%|█▎ | 37/2230 [07:43<6:27:40, 10.61s/it] Setting `use_cache=False`...e computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▎ | 37/2230 [07:43<6:27:40, 10.61s/it] Setting `use_cache=False`...e computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▎ | 37/2230 [07:43<6:27:40, 10.61s/it] Setting `use_cache=False`...e computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:40:30,162 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:40:30,162 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:40:30,162 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:40:34,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:40:34,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:40:34,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:40:40,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:40:40,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:40:42,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:40:42,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:40:46,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:40:46,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:38:06,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▍ | 40/2230 [08:13<6:09:06, 10.11s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▍ | 40/2230 [08:13<6:09:06, 10.11s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:40:53,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:40:53,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:40:53,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:40:58,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:40:58,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:00,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:02,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:04,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:06,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:06,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:08,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:10,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:12,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:14,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:14,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:15,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:17,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:21,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:21,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:22,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:24,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:25,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:25,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:30,322 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:32,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:32,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:34,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:36,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:39,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:39,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:41,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:43,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:43,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:45,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:47,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:47,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:49,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:49,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:49,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7624, 'learning_rate': 2.8199999999999998e-05, 'epoch': 0.11} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:53,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:53,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:41:57,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:01,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:01,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:04,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:04,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.9032, 'learning_rate': 2.88e-05, 'epoch': 0.11} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:08,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:11,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:11,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:15,316 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:15,316 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:18,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:18,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.7495, 'learning_rate': 2.94e-05, 'epoch': 0.12} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:22,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:25,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:25,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:29,273 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:29,273 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:32,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:32,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:36,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:36,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2808, 'learning_rate': 3.06e-05, 'epoch': 0.12} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8594, 'learning_rate': 3.119999999999999e-05, 'epoch': 0.12} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.995, 'learning_rate': 3.1799999999999994e-05, 'epoch': 0.13} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:42:39,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 57/2230 [10:50<7:46:31, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 57/2230 [10:50<7:46:31, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.0311, 'learning_rate': 3.2399999999999995e-05, 'epoch': 0.13} + 3%|██ | 57/2230 [10:50<7:46:31, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 57/2230 [10:50<7:46:31, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 57/2230 [10:50<7:46:31, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 57/2230 [10:50<7:46:31, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 57/2230 [10:50<7:46:31, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 57/2230 [10:50<7:46:31, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.0366, 'learning_rate': 3.2999999999999996e-05, 'epoch': 0.13} + 3%|██ | 57/2230 [10:50<7:46:31, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 57/2230 [10:50<7:46:31, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 57/2230 [10:50<7:46:31, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 57/2230 [10:50<7:46:31, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 57/2230 [10:50<7:46:31, 12.88s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9435, 'learning_rate': 3.36e-05, 'epoch': 0.13} + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8193, 'learning_rate': 3.42e-05, 'epoch': 0.13} + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8357, 'learning_rate': 3.48e-05, 'epoch': 0.14} + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8625, 'learning_rate': 3.539999999999999e-05, 'epoch': 0.14} + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7677, 'learning_rate': 3.5999999999999994e-05, 'epoch': 0.14} + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8096, 'learning_rate': 3.6599999999999995e-05, 'epoch': 0.14} + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7863, 'learning_rate': 3.7199999999999996e-05, 'epoch': 0.15} + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██ | 59/2230 [11:17<7:53:27, 13.08s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6589, 'learning_rate': 3.78e-05, 'epoch': 0.15} + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7568, 'learning_rate': 3.84e-05, 'epoch': 0.15} + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7222, 'learning_rate': 3.9e-05, 'epoch': 0.15} + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7684, 'learning_rate': 3.96e-05, 'epoch': 0.15} + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6798, 'learning_rate': 4.02e-05, 'epoch': 0.16} + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8219, 'learning_rate': 4.08e-05, 'epoch': 0.16} + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8288, 'learning_rate': 4.14e-05, 'epoch': 0.16} + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 66/2230 [12:49<7:46:55, 12.95s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7101, 'learning_rate': 4.259999999999999e-05, 'epoch': 0.17} + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7489, 'learning_rate': 4.319999999999999e-05, 'epoch': 0.17} + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6794, 'learning_rate': 4.3799999999999994e-05, 'epoch': 0.17} + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▌ | 73/2230 [14:15<7:20:02, 12.24s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▋ | 77/2230 [15:03<7:15:10, 12.13s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▋ | 77/2230 [15:03<7:15:10, 12.13s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▋ | 77/2230 [15:03<7:15:10, 12.13s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▋ | 77/2230 [15:03<7:15:10, 12.13s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▋ | 77/2230 [15:03<7:15:10, 12.13s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▋ | 77/2230 [15:03<7:15:10, 12.13s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▋ | 77/2230 [15:03<7:15:10, 12.13s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6963, 'learning_rate': 4.4999999999999996e-05, 'epoch': 0.17} + 3%|██▋ | 77/2230 [15:03<7:15:10, 12.13s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▋ | 77/2230 [15:03<7:15:10, 12.13s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▋ | 77/2230 [15:03<7:15:10, 12.13s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▋ | 77/2230 [15:03<7:15:10, 12.13s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|██▊ | 79/2230 [15:26<7:00:56, 11.74s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|██▊ | 79/2230 [15:26<7:00:56, 11.74s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6342, 'learning_rate': 4.56e-05, 'epoch': 0.18} + 4%|██▊ | 79/2230 [15:26<7:00:56, 11.74s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|██▊ | 79/2230 [15:26<7:00:56, 11.74s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|██▊ | 79/2230 [15:26<7:00:56, 11.74s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|██▊ | 79/2230 [15:26<7:00:56, 11.74s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|██▊ | 79/2230 [15:26<7:00:56, 11.74s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7477, 'learning_rate': 4.62e-05, 'epoch': 0.18} + 4%|██▊ | 79/2230 [15:26<7:00:56, 11.74s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|██▊ | 79/2230 [15:26<7:00:56, 11.74s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|██▊ | 79/2230 [15:26<7:00:56, 11.74s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|██▊ | 79/2230 [15:26<7:00:56, 11.74s/it] Setting `use_cache=False`...1] 2022-03-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:25,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:25,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7116, 'learning_rate': 4.68e-05, 'epoch': 0.18} +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:25,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:25,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:25,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:25,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:25,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:37,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:37,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:42,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:42,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:46,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:46,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7662, 'learning_rate': 4.7999999999999994e-05, 'epoch': 0.19} +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:46,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:46,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:46,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:46,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:57,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:57,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.668, 'learning_rate': 4.8599999999999995e-05, 'epoch': 0.19} +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:57,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:57,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:48:57,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:08,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:08,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7007, 'learning_rate': 4.9199999999999997e-05, 'epoch': 0.19} +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:08,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:08,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:49:16,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|███ | 86/2230 [16:41<6:18:49, 10.60s/it] Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|███ | 86/2230 [16:41<6:18:49, 10.60s/it] Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6541, 'learning_rate': 4.98e-05, 'epoch': 0.19} + 4%|███ | 86/2230 [16:41<6:18:49, 10.60s/it] Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:24,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:24,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|███ | 87/2230 [16:51<6:11:12, 10.39s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|███ | 87/2230 [16:51<6:11:12, 10.39s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:30,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:30,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:30,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:30,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:30,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:30,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:41,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:41,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:49:45,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:49:45,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:49,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:49,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6933, 'learning_rate': 5.1599999999999994e-05, 'epoch': 0.2} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:49:53,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:49:55,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:49:55,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:49:55,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:49:59,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:01,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:03,496 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:05,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:07,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:07,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:09,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:11,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:13,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:13,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:15,305 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:17,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:18,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:20,679 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:20,679 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:22,508 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:24,162 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:27,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:27,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:29,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:30,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:33,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:33,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:34,935 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:37,619 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:38,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:38,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:41,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:43,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:43,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:46,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:48,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:48,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:49,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:50,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:50,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:53,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:55,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:55,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.412, 'learning_rate': 5.82e-05, 'epoch': 0.22} +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:58,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:50:58,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:02,431 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:05,930 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:05,930 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:09,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:09,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.1576, 'learning_rate': 5.88e-05, 'epoch': 0.23} +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:13,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:16,524 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:16,524 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:20,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:20,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:23,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:23,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.8415, 'learning_rate': 5.94e-05, 'epoch': 0.23} +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:26,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:30,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:30,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:33,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:33,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:37,266 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:37,266 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:40,693 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:40,693 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:44,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:44,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:47,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:50,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:50,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:50,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1346, 'learning_rate': 6.0599999999999996e-05, 'epoch': 0.23} +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:50,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:50,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:50,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:50,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:50,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:50,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:50,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.858, 'learning_rate': 6.12e-05, 'epoch': 0.24} +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:50,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:50,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:50,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:51:50,808 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 106/2230 [19:42<7:24:59, 12.57s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 106/2230 [19:42<7:24:59, 12.57s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7614, 'learning_rate': 6.18e-05, 'epoch': 0.24} + 5%|███▋ | 106/2230 [19:42<7:24:59, 12.57s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 106/2230 [19:42<7:24:59, 12.57s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 106/2230 [19:42<7:24:59, 12.57s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 106/2230 [19:42<7:24:59, 12.57s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 106/2230 [19:42<7:24:59, 12.57s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 106/2230 [19:42<7:24:59, 12.57s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 106/2230 [19:42<7:24:59, 12.57s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7545, 'learning_rate': 6.239999999999999e-05, 'epoch': 0.24} + 5%|███▋ | 106/2230 [19:42<7:24:59, 12.57s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 106/2230 [19:42<7:24:59, 12.57s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 106/2230 [19:42<7:24:59, 12.57s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▋ | 106/2230 [19:42<7:24:59, 12.57s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 108/2230 [20:09<7:36:31, 12.91s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 108/2230 [20:09<7:36:31, 12.91s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6803, 'learning_rate': 6.299999999999999e-05, 'epoch': 0.24} + 5%|███▊ | 108/2230 [20:09<7:36:31, 12.91s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 108/2230 [20:09<7:36:31, 12.91s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 108/2230 [20:09<7:36:31, 12.91s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 108/2230 [20:09<7:36:31, 12.91s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 108/2230 [20:09<7:36:31, 12.91s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 108/2230 [20:09<7:36:31, 12.91s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 108/2230 [20:09<7:36:31, 12.91s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6851, 'learning_rate': 6.359999999999999e-05, 'epoch': 0.24} + 5%|███▊ | 108/2230 [20:09<7:36:31, 12.91s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 108/2230 [20:09<7:36:31, 12.91s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 108/2230 [20:09<7:36:31, 12.91s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 108/2230 [20:09<7:36:31, 12.91s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7716, 'learning_rate': 6.419999999999999e-05, 'epoch': 0.25} + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7469, 'learning_rate': 6.479999999999999e-05, 'epoch': 0.25} + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7064, 'learning_rate': 6.539999999999999e-05, 'epoch': 0.25} + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6541, 'learning_rate': 6.599999999999999e-05, 'epoch': 0.25} + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▊ | 110/2230 [20:35<7:41:21, 13.06s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5913, 'learning_rate': 6.659999999999999e-05, 'epoch': 0.26} + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.577, 'learning_rate': 6.72e-05, 'epoch': 0.26} + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6932, 'learning_rate': 6.78e-05, 'epoch': 0.26} + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6003, 'learning_rate': 6.84e-05, 'epoch': 0.26} + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6862, 'learning_rate': 6.9e-05, 'epoch': 0.26} + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6215, 'learning_rate': 6.96e-05, 'epoch': 0.27} + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6836, 'learning_rate': 7.02e-05, 'epoch': 0.27} + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4829, 'learning_rate': 7.079999999999999e-05, 'epoch': 0.27} + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5853, 'learning_rate': 7.139999999999999e-05, 'epoch': 0.27} + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5679, 'learning_rate': 7.199999999999999e-05, 'epoch': 0.28} + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5366, 'learning_rate': 7.259999999999999e-05, 'epoch': 0.28} + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6412, 'learning_rate': 7.319999999999999e-05, 'epoch': 0.28} + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6462, 'learning_rate': 7.379999999999999e-05, 'epoch': 0.28} + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5981, 'learning_rate': 7.439999999999999e-05, 'epoch': 0.28} + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 114/2230 [21:28<7:47:25, 13.25s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 128/2230 [24:20<6:56:54, 11.90s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 128/2230 [24:20<6:56:54, 11.90s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.547, 'learning_rate': 7.5e-05, 'epoch': 0.29} + 6%|████▍ | 128/2230 [24:20<6:56:54, 11.90s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 128/2230 [24:20<6:56:54, 11.90s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 128/2230 [24:20<6:56:54, 11.90s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 128/2230 [24:20<6:56:54, 11.90s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 128/2230 [24:20<6:56:54, 11.90s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 128/2230 [24:20<6:56:54, 11.90s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4812, 'learning_rate': 7.56e-05, 'epoch': 0.29} + 6%|████▍ | 128/2230 [24:20<6:56:54, 11.90s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 128/2230 [24:20<6:56:54, 11.90s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 128/2230 [24:20<6:56:54, 11.90s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▌ | 130/2230 [24:43<6:47:04, 11.63s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▌ | 130/2230 [24:43<6:47:04, 11.63s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.635, 'learning_rate': 7.62e-05, 'epoch': 0.29} + 6%|████▌ | 130/2230 [24:43<6:47:04, 11.63s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▌ | 130/2230 [24:43<6:47:04, 11.63s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▌ | 130/2230 [24:43<6:47:04, 11.63s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▌ | 130/2230 [24:43<6:47:04, 11.63s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▌ | 130/2230 [24:43<6:47:04, 11.63s/it]g-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:57:32,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:57:32,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:57:32,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:57:38,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:57:38,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:57:38,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5837, 'learning_rate': 7.74e-05, 'epoch': 0.3} +[WARNING|modeling_bart.py:1051] 2022-03-22 16:57:38,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:57:38,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:57:38,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:57:50,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4777, 'learning_rate': 7.8e-05, 'epoch': 0.3} + Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5206, 'learning_rate': 7.86e-05, 'epoch': 0.3} + Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:58:10,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:58:10,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:58:10,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6975, 'learning_rate': 7.92e-05, 'epoch': 0.3} +[WARNING|modeling_utils.py:388] 2022-03-22 16:58:10,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:58:19,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:58:19,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:58:19,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:58:19,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5397, 'learning_rate': 7.98e-05, 'epoch': 0.3} +[WARNING|modeling_utils.py:388] 2022-03-22 16:58:27,379 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:58:27,379 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:58:31,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 137/2230 [25:57<6:02:55, 10.40s/it] Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 137/2230 [25:57<6:02:55, 10.40s/it] Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5827, 'learning_rate': 8.04e-05, 'epoch': 0.31} + 6%|████▊ | 137/2230 [25:57<6:02:55, 10.40s/it] Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 137/2230 [25:57<6:02:55, 10.40s/it] Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 137/2230 [25:57<6:02:55, 10.40s/it] Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 137/2230 [25:57<6:02:55, 10.40s/it] Setting `use_cache=False`...e computed-22 16:40:50,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 138/2230 [26:08<6:12:47, 10.69s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:58:45,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 138/2230 [26:08<6:12:47, 10.69s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:58:45,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6354, 'learning_rate': 8.1e-05, 'epoch': 0.31} +[WARNING|modeling_utils.py:388] 2022-03-22 16:58:49,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:58:45,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:58:51,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:58:45,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:58:51,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:58:45,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:58:51,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:58:45,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6395, 'learning_rate': 8.16e-05, 'epoch': 0.31} +[WARNING|modeling_utils.py:388] 2022-03-22 16:58:57,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:58:45,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:58:57,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:58:45,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 16:59:01,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 16:58:45,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▉ | 140/2230 [26:26<5:44:06, 9.88s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▉ | 140/2230 [26:26<5:44:06, 9.88s/it][WARNING|modeling_bart.py:1051] 2022-03-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.654, 'learning_rate': 8.22e-05, 'epoch': 0.31} +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:07,496 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:09,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:11,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:11,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:13,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:15,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:17,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:19,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:19,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:21,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:23,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:26,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:26,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:28,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:29,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:31,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:31,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:34,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:36,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:37,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:37,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:40,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:41,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:44,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:44,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:46,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:47,949 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:47,949 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:50,222 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:52,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:52,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:54,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:56,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:56,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:58,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 16:59:59,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:01,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:01,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4768, 'learning_rate': 8.819999999999999e-05, 'epoch': 0.34} +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:05,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:08,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:08,799 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:12,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:12,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:12,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:15,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:19,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:19,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:22,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:22,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:26,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:29,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:29,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2105, 'learning_rate': 8.939999999999999e-05, 'epoch': 0.34} +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:33,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:33,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:36,590 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:39,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:39,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:43,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:43,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.0365, 'learning_rate': 8.999999999999999e-05, 'epoch': 0.34} +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:46,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:46,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:50,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:53,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:53,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:56,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:00:56,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:01:00,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:01:00,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:01:00,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:01:00,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:01:00,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 155/2230 [28:33<7:00:29, 12.16s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 155/2230 [28:33<7:00:29, 12.16s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.814, 'learning_rate': 9.12e-05, 'epoch': 0.35} + 7%|█████▍ | 155/2230 [28:33<7:00:29, 12.16s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 155/2230 [28:33<7:00:29, 12.16s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 155/2230 [28:33<7:00:29, 12.16s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 155/2230 [28:33<7:00:29, 12.16s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 155/2230 [28:33<7:00:29, 12.16s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 155/2230 [28:33<7:00:29, 12.16s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6306, 'learning_rate': 9.18e-05, 'epoch': 0.35} + 7%|█████▍ | 155/2230 [28:33<7:00:29, 12.16s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 155/2230 [28:33<7:00:29, 12.16s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 155/2230 [28:33<7:00:29, 12.16s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 155/2230 [28:33<7:00:29, 12.16s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▍ | 155/2230 [28:33<7:00:29, 12.16s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5816, 'learning_rate': 9.24e-05, 'epoch': 0.35} + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6355, 'learning_rate': 9.3e-05, 'epoch': 0.35} + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6574, 'learning_rate': 9.36e-05, 'epoch': 0.36} + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6793, 'learning_rate': 9.419999999999999e-05, 'epoch': 0.36} + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6908, 'learning_rate': 9.479999999999999e-05, 'epoch': 0.36} + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6475, 'learning_rate': 9.539999999999999e-05, 'epoch': 0.36} + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5892, 'learning_rate': 9.599999999999999e-05, 'epoch': 0.37} + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6243, 'learning_rate': 9.659999999999999e-05, 'epoch': 0.37} + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4789, 'learning_rate': 9.719999999999999e-05, 'epoch': 0.37} + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▊ | 166/2230 [30:58<7:24:53, 12.93s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▊ | 166/2230 [30:58<7:24:53, 12.93s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5289, 'learning_rate': 9.779999999999999e-05, 'epoch': 0.37} + 7%|█████▊ | 166/2230 [30:58<7:24:53, 12.93s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▊ | 166/2230 [30:58<7:24:53, 12.93s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▊ | 166/2230 [30:58<7:24:53, 12.93s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▊ | 166/2230 [30:58<7:24:53, 12.93s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▊ | 167/2230 [31:10<7:20:40, 12.82s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▊ | 167/2230 [31:10<7:20:40, 12.82s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4315, 'learning_rate': 9.839999999999999e-05, 'epoch': 0.37} + 7%|█████▊ | 167/2230 [31:10<7:20:40, 12.82s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▊ | 167/2230 [31:10<7:20:40, 12.82s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▊ | 167/2230 [31:10<7:20:40, 12.82s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 7%|█████▊ | 167/2230 [31:10<7:20:40, 12.82s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 168/2230 [31:23<7:16:38, 12.71s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 168/2230 [31:23<7:16:38, 12.71s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.505, 'learning_rate': 9.9e-05, 'epoch': 0.38} + 8%|█████▉ | 168/2230 [31:23<7:16:38, 12.71s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 168/2230 [31:23<7:16:38, 12.71s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 168/2230 [31:23<7:16:38, 12.71s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 168/2230 [31:23<7:16:38, 12.71s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 169/2230 [31:35<7:11:51, 12.57s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 169/2230 [31:35<7:11:51, 12.57s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5744, 'learning_rate': 9.96e-05, 'epoch': 0.38} + 8%|█████▉ | 169/2230 [31:35<7:11:51, 12.57s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 169/2230 [31:35<7:11:51, 12.57s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 169/2230 [31:35<7:11:51, 12.57s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 169/2230 [31:35<7:11:51, 12.57s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 170/2230 [31:47<7:08:10, 12.47s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 170/2230 [31:47<7:08:10, 12.47s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5639, 'learning_rate': 0.0001002, 'epoch': 0.38} + 8%|█████▉ | 170/2230 [31:47<7:08:10, 12.47s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 170/2230 [31:47<7:08:10, 12.47s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 170/2230 [31:47<7:08:10, 12.47s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 170/2230 [31:47<7:08:10, 12.47s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 171/2230 [32:00<7:05:32, 12.40s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 171/2230 [32:00<7:05:32, 12.40s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.563, 'learning_rate': 0.0001008, 'epoch': 0.38} + 8%|█████▉ | 171/2230 [32:00<7:05:32, 12.40s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 171/2230 [32:00<7:05:32, 12.40s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 171/2230 [32:00<7:05:32, 12.40s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 171/2230 [32:00<7:05:32, 12.40s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 171/2230 [32:00<7:05:32, 12.40s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6649, 'learning_rate': 0.0001014, 'epoch': 0.39} + 8%|█████▉ | 171/2230 [32:00<7:05:32, 12.40s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 171/2230 [32:00<7:05:32, 12.40s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 171/2230 [32:00<7:05:32, 12.40s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 171/2230 [32:00<7:05:32, 12.40s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 171/2230 [32:00<7:05:32, 12.40s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|█████▉ | 171/2230 [32:00<7:05:32, 12.40s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5898, 'learning_rate': 0.0001026, 'epoch': 0.39} + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6714, 'learning_rate': 0.00010319999999999999, 'epoch': 0.39} + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████ | 173/2230 [32:24<7:00:36, 12.27s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▏ | 176/2230 [33:01<7:04:09, 12.39s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▏ | 176/2230 [33:01<7:04:09, 12.39s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4916, 'learning_rate': 0.00010379999999999999, 'epoch': 0.39} + 8%|██████▏ | 176/2230 [33:01<7:04:09, 12.39s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▏ | 176/2230 [33:01<7:04:09, 12.39s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▏ | 176/2230 [33:01<7:04:09, 12.39s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▏ | 176/2230 [33:01<7:04:09, 12.39s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▏ | 176/2230 [33:01<7:04:09, 12.39s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4783, 'learning_rate': 0.00010439999999999999, 'epoch': 0.4} + 8%|██████▏ | 176/2230 [33:01<7:04:09, 12.39s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▏ | 176/2230 [33:01<7:04:09, 12.39s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:05:57,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:05:57,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:05:57,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:05:57,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.549, 'learning_rate': 0.00010499999999999999, 'epoch': 0.4} +[WARNING|modeling_utils.py:388] 2022-03-22 17:05:57,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:05:57,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:05:57,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:05:57,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▎ | 179/2230 [33:36<6:42:37, 11.78s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▎ | 179/2230 [33:36<6:42:37, 11.78s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6129, 'learning_rate': 0.00010559999999999998, 'epoch': 0.4} + 8%|██████▎ | 179/2230 [33:36<6:42:37, 11.78s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▎ | 179/2230 [33:36<6:42:37, 11.78s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▎ | 179/2230 [33:36<6:42:37, 11.78s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▎ | 179/2230 [33:36<6:42:37, 11.78s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▎ | 179/2230 [33:36<6:42:37, 11.78s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▎ | 179/2230 [33:36<6:42:37, 11.78s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6554, 'learning_rate': 0.00010619999999999998, 'epoch': 0.4} +[WARNING|modeling_utils.py:388] 2022-03-22 17:06:29,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:06:29,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:06:33,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:06:33,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:06:33,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5494, 'learning_rate': 0.00010679999999999998, 'epoch': 0.41} +[WARNING|modeling_utils.py:388] 2022-03-22 17:06:33,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:06:33,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:06:33,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:06:33,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:06:33,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:06:33,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5649, 'learning_rate': 0.00010739999999999998, 'epoch': 0.41} +[WARNING|modeling_utils.py:388] 2022-03-22 17:06:51,982 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:06:51,982 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:06:56,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:06:56,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:06:56,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:07:00,088 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:07:00,088 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:07:00,088 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:07:00,088 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▍ | 184/2230 [34:31<6:16:30, 11.04s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▍ | 184/2230 [34:31<6:16:30, 11.04s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:07:10,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:07:10,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:07:10,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:07:10,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▍ | 185/2230 [34:41<6:09:00, 10.83s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▍ | 185/2230 [34:41<6:09:00, 10.83s/it]g-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:07:20,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:07:20,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:07:20,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:07:20,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:07:20,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 16:59:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▌ | 186/2230 [34:51<6:02:22, 10.64s/it][WARNING|modeling_bart.py:1051] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▌ | 186/2230 [34:51<6:02:22, 10.64s/it][WARNING|modeling_bart.py:1051] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▌ | 186/2230 [34:51<6:02:22, 10.64s/it][WARNING|modeling_bart.py:1051] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:07:35,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:07:35,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:07:35,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4747, 'learning_rate': 0.00011039999999999999, 'epoch': 0.42} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:07:35,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:07:35,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:07:35,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:07:35,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:07:49,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:07:49,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4546, 'learning_rate': 0.00011099999999999999, 'epoch': 0.42} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:07:49,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:07:55,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:07:57,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:07:57,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▌ | 189/2230 [35:22<5:51:19, 10.33s/it] Setting `use_cache=False`...1] 2022-03-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:01,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:03,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:05,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:05,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:05,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:08:09,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:08:12,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:08:14,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:08:16,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:08:16,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5184, 'learning_rate': 0.00011279999999999999, 'epoch': 0.43} +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:19,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:21,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:23,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:23,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:25,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:27,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:29,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:31,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:31,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:32,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:34,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:38,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:38,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:39,829 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:41,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:42,979 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:42,979 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:46,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:47,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:50,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:50,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:51,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:53,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:53,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:56,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:58,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:08:58,313 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:00,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:02,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:02,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:03,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:06,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:06,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:06,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:10,204 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:10,204 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:13,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:13,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:17,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:17,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:20,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:20,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:24,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:24,452 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:27,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:27,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:31,352 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:31,352 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:34,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:34,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:38,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:38,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:41,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:41,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:45,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:48,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:48,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.3307, 'learning_rate': 0.00011999999999999999, 'epoch': 0.46} +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:51,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:51,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:55,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.0387, 'learning_rate': 0.00012059999999999999, 'epoch': 0.46} +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.969, 'learning_rate': 0.00012119999999999999, 'epoch': 0.46} +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8809, 'learning_rate': 0.00012179999999999999, 'epoch': 0.46} +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6725, 'learning_rate': 0.0001224, 'epoch': 0.46} +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7561, 'learning_rate': 0.00012299999999999998, 'epoch': 0.47} +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7825, 'learning_rate': 0.0001236, 'epoch': 0.47} +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7099, 'learning_rate': 0.00012419999999999998, 'epoch': 0.47} +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5271, 'learning_rate': 0.00012479999999999997, 'epoch': 0.47} +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.642, 'learning_rate': 0.00012539999999999999, 'epoch': 0.48} +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4652, 'learning_rate': 0.00012599999999999997, 'epoch': 0.48} +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:09:58,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 214/2230 [39:39<7:22:42, 13.18s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 214/2230 [39:39<7:22:42, 13.18s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5103, 'learning_rate': 0.0001266, 'epoch': 0.48} + 10%|███████▍ | 214/2230 [39:39<7:22:42, 13.18s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 214/2230 [39:39<7:22:42, 13.18s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 214/2230 [39:39<7:22:42, 13.18s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▍ | 214/2230 [39:39<7:22:42, 13.18s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 215/2230 [39:51<7:16:16, 12.99s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 215/2230 [39:51<7:16:16, 12.99s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5699, 'learning_rate': 0.00012719999999999997, 'epoch': 0.48} + 10%|███████▌ | 215/2230 [39:51<7:16:16, 12.99s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 215/2230 [39:51<7:16:16, 12.99s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 215/2230 [39:51<7:16:16, 12.99s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 215/2230 [39:51<7:16:16, 12.99s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5555, 'learning_rate': 0.0001278, 'epoch': 0.48} + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5327, 'learning_rate': 0.00012839999999999998, 'epoch': 0.49} + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5224, 'learning_rate': 0.000129, 'epoch': 0.49} + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4599, 'learning_rate': 0.00012959999999999998, 'epoch': 0.49} + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5868, 'learning_rate': 0.0001302, 'epoch': 0.49} + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4898, 'learning_rate': 0.00013079999999999998, 'epoch': 0.5} + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5415, 'learning_rate': 0.0001314, 'epoch': 0.5} + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▌ | 216/2230 [40:04<7:10:42, 12.83s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 223/2230 [41:30<6:47:44, 12.19s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 223/2230 [41:30<6:47:44, 12.19s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5053, 'learning_rate': 0.00013199999999999998, 'epoch': 0.5} + 10%|███████▊ | 223/2230 [41:30<6:47:44, 12.19s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 223/2230 [41:30<6:47:44, 12.19s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 223/2230 [41:30<6:47:44, 12.19s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 223/2230 [41:30<6:47:44, 12.19s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 223/2230 [41:30<6:47:44, 12.19s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 223/2230 [41:30<6:47:44, 12.19s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4692, 'learning_rate': 0.0001326, 'epoch': 0.5} + 10%|███████▊ | 223/2230 [41:30<6:47:44, 12.19s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 223/2230 [41:30<6:47:44, 12.19s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:14:27,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:14:27,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:14:27,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:14:27,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6647, 'learning_rate': 0.00013319999999999999, 'epoch': 0.5} +[WARNING|modeling_utils.py:388] 2022-03-22 17:14:27,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:14:39,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:14:39,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:14:39,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:14:39,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:14:39,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5009, 'learning_rate': 0.0001338, 'epoch': 0.51} +[WARNING|modeling_utils.py:388] 2022-03-22 17:14:39,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:14:39,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:14:39,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:14:39,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 227/2230 [42:19<6:44:10, 12.11s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 227/2230 [42:19<6:44:10, 12.11s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 227/2230 [42:19<6:44:10, 12.11s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 227/2230 [42:19<6:44:10, 12.11s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:04,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:04,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:04,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5966, 'learning_rate': 0.000135, 'epoch': 0.51} +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:04,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:04,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:04,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:04,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 229/2230 [42:42<6:34:10, 11.82s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 229/2230 [42:42<6:34:10, 11.82s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5119, 'learning_rate': 0.0001356, 'epoch': 0.51} + 10%|████████ | 229/2230 [42:42<6:34:10, 11.82s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 229/2230 [42:42<6:34:10, 11.82s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 229/2230 [42:42<6:34:10, 11.82s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 229/2230 [42:42<6:34:10, 11.82s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████ | 229/2230 [42:42<6:34:10, 11.82s/it]g-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.528, 'learning_rate': 0.0001362, 'epoch': 0.52} +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:32,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:32,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:37,154 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:37,154 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:41,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:41,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6159, 'learning_rate': 0.0001368, 'epoch': 0.52} +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:41,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:41,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:41,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:41,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:41,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4335, 'learning_rate': 0.0001374, 'epoch': 0.52} +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:41,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:41,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:15:41,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:16:01,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 233/2230 [43:26<6:15:39, 11.29s/it] Setting `use_cache=False`...e computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 233/2230 [43:26<6:15:39, 11.29s/it] Setting `use_cache=False`...e computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.452, 'learning_rate': 0.000138, 'epoch': 0.52} + 10%|████████▏ | 233/2230 [43:26<6:15:39, 11.29s/it] Setting `use_cache=False`...e computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 233/2230 [43:26<6:15:39, 11.29s/it] Setting `use_cache=False`...e computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 233/2230 [43:26<6:15:39, 11.29s/it] Setting `use_cache=False`...e computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 233/2230 [43:26<6:15:39, 11.29s/it] Setting `use_cache=False`...e computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 233/2230 [43:26<6:15:39, 11.29s/it] Setting `use_cache=False`...e computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:15,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:15,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:15,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:15,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:15,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:15,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:25,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:25,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:25,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:25,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:25,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:25,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:36,279 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:36,279 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:36,279 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:36,279 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:36,279 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:46,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:46,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5933, 'learning_rate': 0.0001404, 'epoch': 0.53} +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:46,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:46,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:46,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:56,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:56,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4843, 'learning_rate': 0.00014099999999999998, 'epoch': 0.53} +[WARNING|modeling_utils.py:388] 2022-03-22 17:16:56,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:17:02,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:17:02,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:07:29,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 239/2230 [44:29<5:47:13, 10.46s/it][WARNING|modeling_bart.py:1051] 2022-03-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▎ | 239/2230 [44:29<5:47:13, 10.46s/it][WARNING|modeling_bart.py:1051] 2022-03-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4572, 'learning_rate': 0.00014159999999999997, 'epoch': 0.54} +[WARNING|modeling_utils.py:388] 2022-03-22 17:17:10,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:17:13,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:17:15,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:17:15,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.528, 'learning_rate': 0.0001422, 'epoch': 0.54} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:19,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:21,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:23,523 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:25,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:25,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:27,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:29,544 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:31,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:31,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:33,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:35,272 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:37,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:38,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:38,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:40,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:43,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:45,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:45,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:47,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:50,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:50,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:51,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:54,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:55,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:55,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:58,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:59,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:17:59,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:01,825 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:04,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:04,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:05,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:07,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:07,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:09,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:12,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:12,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:12,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3235, 'learning_rate': 0.0001482, 'epoch': 0.56} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:17,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:17,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:21,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:24,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:24,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:28,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:28,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.7329, 'learning_rate': 0.00014879999999999998, 'epoch': 0.56} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:31,849 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:31,849 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:35,309 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:38,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:38,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:42,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:42,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.6422, 'learning_rate': 0.0001494, 'epoch': 0.57} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:45,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:45,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:49,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:52,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:52,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:55,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:55,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.3057, 'learning_rate': 0.00015, 'epoch': 0.57} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:18:59,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:02,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:02,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:06,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.933, 'learning_rate': 0.00015059999999999997, 'epoch': 0.57} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8009, 'learning_rate': 0.0001512, 'epoch': 0.57} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7696, 'learning_rate': 0.00015179999999999998, 'epoch': 0.57} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6338, 'learning_rate': 0.0001524, 'epoch': 0.58} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5604, 'learning_rate': 0.00015299999999999998, 'epoch': 0.58} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:19:09,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6174, 'learning_rate': 0.0001536, 'epoch': 0.58} + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5476, 'learning_rate': 0.00015419999999999998, 'epoch': 0.58} + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6667, 'learning_rate': 0.0001548, 'epoch': 0.59} + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 259/2230 [47:40<7:06:07, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5453, 'learning_rate': 0.00015539999999999998, 'epoch': 0.59} + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.684, 'learning_rate': 0.000156, 'epoch': 0.59} + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5324, 'learning_rate': 0.00015659999999999998, 'epoch': 0.59} + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5577, 'learning_rate': 0.0001572, 'epoch': 0.59} + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.492, 'learning_rate': 0.0001578, 'epoch': 0.6} + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▏ | 262/2230 [48:19<7:05:23, 12.97s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▎ | 267/2230 [49:24<6:57:57, 12.77s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▎ | 267/2230 [49:24<6:57:57, 12.77s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6238, 'learning_rate': 0.0001584, 'epoch': 0.6} + 12%|█████████▎ | 267/2230 [49:24<6:57:57, 12.77s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▎ | 267/2230 [49:24<6:57:57, 12.77s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▎ | 267/2230 [49:24<6:57:57, 12.77s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▎ | 267/2230 [49:24<6:57:57, 12.77s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▎ | 268/2230 [49:36<6:53:50, 12.66s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▎ | 268/2230 [49:36<6:53:50, 12.66s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4791, 'learning_rate': 0.000159, 'epoch': 0.6} + 12%|█████████▎ | 268/2230 [49:36<6:53:50, 12.66s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▎ | 268/2230 [49:36<6:53:50, 12.66s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▎ | 268/2230 [49:36<6:53:50, 12.66s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▎ | 268/2230 [49:36<6:53:50, 12.66s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 269/2230 [49:49<6:51:48, 12.60s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 269/2230 [49:49<6:51:48, 12.60s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6115, 'learning_rate': 0.0001596, 'epoch': 0.6} + 12%|█████████▍ | 269/2230 [49:49<6:51:48, 12.60s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 269/2230 [49:49<6:51:48, 12.60s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 269/2230 [49:49<6:51:48, 12.60s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 269/2230 [49:49<6:51:48, 12.60s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 269/2230 [49:49<6:51:48, 12.60s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 270/2230 [50:01<6:49:11, 12.53s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 270/2230 [50:01<6:49:11, 12.53s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 270/2230 [50:01<6:49:11, 12.53s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 270/2230 [50:01<6:49:11, 12.53s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 270/2230 [50:01<6:49:11, 12.53s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 270/2230 [50:01<6:49:11, 12.53s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 270/2230 [50:01<6:49:11, 12.53s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 271/2230 [50:13<6:45:42, 12.43s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 271/2230 [50:13<6:45:42, 12.43s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 271/2230 [50:13<6:45:42, 12.43s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 271/2230 [50:13<6:45:42, 12.43s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 271/2230 [50:13<6:45:42, 12.43s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 271/2230 [50:13<6:45:42, 12.43s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 271/2230 [50:13<6:45:42, 12.43s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3894, 'learning_rate': 0.000162, 'epoch': 0.61} + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.448, 'learning_rate': 0.0001626, 'epoch': 0.61} + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5522, 'learning_rate': 0.0001632, 'epoch': 0.62} + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 272/2230 [50:25<6:41:40, 12.31s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 276/2230 [51:15<6:42:18, 12.35s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 276/2230 [51:15<6:42:18, 12.35s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4934, 'learning_rate': 0.0001638, 'epoch': 0.62} + 12%|█████████▋ | 276/2230 [51:15<6:42:18, 12.35s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 276/2230 [51:15<6:42:18, 12.35s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 276/2230 [51:15<6:42:18, 12.35s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 276/2230 [51:15<6:42:18, 12.35s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 276/2230 [51:15<6:42:18, 12.35s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 276/2230 [51:15<6:42:18, 12.35s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4612, 'learning_rate': 0.0001644, 'epoch': 0.62} + 12%|█████████▋ | 276/2230 [51:15<6:42:18, 12.35s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 276/2230 [51:15<6:42:18, 12.35s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 276/2230 [51:15<6:42:18, 12.35s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 276/2230 [51:15<6:42:18, 12.35s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 276/2230 [51:15<6:42:18, 12.35s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4943, 'learning_rate': 0.000165, 'epoch': 0.62} + 12%|█████████▋ | 276/2230 [51:15<6:42:18, 12.35s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:24:20,688 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:24:20,688 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:24:20,688 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 279/2230 [51:49<6:24:41, 11.83s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 279/2230 [51:49<6:24:41, 11.83s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5346, 'learning_rate': 0.0001656, 'epoch': 0.63} + 13%|█████████▊ | 279/2230 [51:49<6:24:41, 11.83s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 279/2230 [51:49<6:24:41, 11.83s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 279/2230 [51:49<6:24:41, 11.83s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 279/2230 [51:49<6:24:41, 11.83s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 279/2230 [51:49<6:24:41, 11.83s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 279/2230 [51:49<6:24:41, 11.83s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6029, 'learning_rate': 0.0001662, 'epoch': 0.63} +[WARNING|modeling_utils.py:388] 2022-03-22 17:24:43,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:24:43,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:24:43,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 281/2230 [52:12<6:15:13, 11.55s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 281/2230 [52:12<6:15:13, 11.55s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5479, 'learning_rate': 0.0001668, 'epoch': 0.63} + 13%|█████████▊ | 281/2230 [52:12<6:15:13, 11.55s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 281/2230 [52:12<6:15:13, 11.55s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:24:57,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:24:57,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:24:57,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:24:57,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3756, 'learning_rate': 0.0001674, 'epoch': 0.63} +[WARNING|modeling_utils.py:388] 2022-03-22 17:24:57,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:24:57,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:09,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:09,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:09,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:13,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:13,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:13,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:13,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 284/2230 [52:45<6:00:00, 11.10s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 284/2230 [52:45<6:00:00, 11.10s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4843, 'learning_rate': 0.0001686, 'epoch': 0.64} +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:25,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:25,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:25,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:25,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:25,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5724, 'learning_rate': 0.00016919999999999997, 'epoch': 0.64} +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:25,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:25,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:25,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:25,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:25,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.476, 'learning_rate': 0.00016979999999999998, 'epoch': 0.64} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:25:46,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:25:46,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:50,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:50,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:50,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:50,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:56,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:56,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:25:56,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:02,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|██████████ | 288/2230 [53:28<5:53:10, 10.91s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|██████████ | 288/2230 [53:28<5:53:10, 10.91s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.496, 'learning_rate': 0.00017099999999999998, 'epoch': 0.65} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:08,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:08,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:26:12,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:26:12,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4438, 'learning_rate': 0.00017159999999999997, 'epoch': 0.65} +[WARNING|modeling_utils.py:388] 2022-03-22 17:26:16,368 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:26:18,589 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:26:20,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:26:23,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:26:23,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4612, 'learning_rate': 0.00017219999999999998, 'epoch': 0.65} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:27,037 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:29,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:31,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:31,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:33,244 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:35,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:37,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:39,006 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:39,006 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:40,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:42,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:44,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:44,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:46,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:48,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:51,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:52,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:52,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:54,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:57,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:57,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:26:58,870 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:01,579 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:02,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:02,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:05,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:07,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:07,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:09,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:11,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:11,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:13,909 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:15,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:15,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:17,438 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:19,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:19,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:19,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4758, 'learning_rate': 0.00017819999999999997, 'epoch': 0.67} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:24,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:24,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:28,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:31,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:31,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:35,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:35,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.5599, 'learning_rate': 0.00017879999999999998, 'epoch': 0.67} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:38,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:42,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:42,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:45,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:45,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:45,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:48,986 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:52,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:52,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:55,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:55,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:59,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:59,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:27:59,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:02,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:06,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:06,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:09,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:12,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:12,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.996, 'learning_rate': 0.00018059999999999997, 'epoch': 0.68} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8184, 'learning_rate': 0.00018119999999999999, 'epoch': 0.68} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6625, 'learning_rate': 0.00018179999999999997, 'epoch': 0.69} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.666, 'learning_rate': 0.0001824, 'epoch': 0.69} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:28:16,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6589, 'learning_rate': 0.00018299999999999998, 'epoch': 0.69} + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.641, 'learning_rate': 0.0001836, 'epoch': 0.69} + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7132, 'learning_rate': 0.00018419999999999998, 'epoch': 0.7} + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 308/2230 [56:33<6:53:23, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5677, 'learning_rate': 0.00018539999999999998, 'epoch': 0.7} + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5749, 'learning_rate': 0.000186, 'epoch': 0.7} + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4865, 'learning_rate': 0.00018659999999999998, 'epoch': 0.7} + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5187, 'learning_rate': 0.0001872, 'epoch': 0.71} + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 311/2230 [57:12<6:52:43, 12.90s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████ | 316/2230 [58:17<6:50:44, 12.88s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████ | 316/2230 [58:17<6:50:44, 12.88s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4829, 'learning_rate': 0.00018779999999999998, 'epoch': 0.71} + 14%|███████████ | 316/2230 [58:17<6:50:44, 12.88s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████ | 316/2230 [58:17<6:50:44, 12.88s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████ | 316/2230 [58:17<6:50:44, 12.88s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████ | 316/2230 [58:17<6:50:44, 12.88s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4654, 'learning_rate': 0.00018839999999999997, 'epoch': 0.71} + Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4546, 'learning_rate': 0.00018899999999999999, 'epoch': 0.71} + Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▏ | 319/2230 [58:55<6:39:30, 12.54s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▏ | 319/2230 [58:55<6:39:30, 12.54s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.471, 'learning_rate': 0.00018959999999999997, 'epoch': 0.72} + 14%|███████████▏ | 319/2230 [58:55<6:39:30, 12.54s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▏ | 319/2230 [58:55<6:39:30, 12.54s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▏ | 319/2230 [58:55<6:39:30, 12.54s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▏ | 319/2230 [58:55<6:39:30, 12.54s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▏ | 319/2230 [58:55<6:39:30, 12.54s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▏ | 319/2230 [58:55<6:39:30, 12.54s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5822, 'learning_rate': 0.0001902, 'epoch': 0.72} + 14%|███████████▏ | 319/2230 [58:55<6:39:30, 12.54s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▏ | 319/2230 [58:55<6:39:30, 12.54s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▏ | 319/2230 [58:55<6:39:30, 12.54s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▏ | 319/2230 [58:55<6:39:30, 12.54s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▏ | 321/2230 [59:19<6:34:21, 12.39s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▏ | 321/2230 [59:19<6:34:21, 12.39s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4716, 'learning_rate': 0.00019079999999999998, 'epoch': 0.72} + 14%|███████████▏ | 321/2230 [59:19<6:34:21, 12.39s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▏ | 321/2230 [59:19<6:34:21, 12.39s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▏ | 321/2230 [59:19<6:34:21, 12.39s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▏ | 321/2230 [59:19<6:34:21, 12.39s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3796, 'learning_rate': 0.0001914, 'epoch': 0.72} + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.477, 'learning_rate': 0.00019199999999999998, 'epoch': 0.72} + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3998, 'learning_rate': 0.0001926, 'epoch': 0.73} + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5151, 'learning_rate': 0.00019319999999999998, 'epoch': 0.73} + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5013, 'learning_rate': 0.0001938, 'epoch': 0.73} + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|███████████▎ | 322/2230 [59:31<6:32:04, 12.33s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:09,686 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:09,686 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:09,686 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:09,686 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:09,686 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:09,686 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:09,686 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3593, 'learning_rate': 0.000195, 'epoch': 0.74} +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:24,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:24,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:24,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:24,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:24,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:24,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4115, 'learning_rate': 0.00019559999999999998, 'epoch': 0.74} +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:24,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:38,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:38,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:38,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:33:38,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 330/2230 [1:01:07<6:12:18, 11.76s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 330/2230 [1:01:07<6:12:18, 11.76s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 330/2230 [1:01:07<6:12:18, 11.76s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 330/2230 [1:01:07<6:12:18, 11.76s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 330/2230 [1:01:07<6:12:18, 11.76s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 330/2230 [1:01:07<6:12:18, 11.76s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 330/2230 [1:01:07<6:12:18, 11.76s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5211, 'learning_rate': 0.00019679999999999999, 'epoch': 0.74} + 15%|███████████▏ | 330/2230 [1:01:07<6:12:18, 11.76s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 330/2230 [1:01:07<6:12:18, 11.76s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 330/2230 [1:01:07<6:12:18, 11.76s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▏ | 330/2230 [1:01:07<6:12:18, 11.76s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▎ | 332/2230 [1:01:30<6:03:14, 11.48s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▎ | 332/2230 [1:01:30<6:03:14, 11.48s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3388, 'learning_rate': 0.0001974, 'epoch': 0.74} + 15%|███████████▎ | 332/2230 [1:01:30<6:03:14, 11.48s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▎ | 332/2230 [1:01:30<6:03:14, 11.48s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▎ | 332/2230 [1:01:30<6:03:14, 11.48s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▎ | 332/2230 [1:01:30<6:03:14, 11.48s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▎ | 332/2230 [1:01:30<6:03:14, 11.48s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5557, 'learning_rate': 0.000198, 'epoch': 0.75} + 15%|███████████▎ | 332/2230 [1:01:30<6:03:14, 11.48s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▎ | 332/2230 [1:01:30<6:03:14, 11.48s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▎ | 332/2230 [1:01:30<6:03:14, 11.48s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:27,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:27,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:27,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:31,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:31,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:31,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:31,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▍ | 335/2230 [1:02:02<5:49:16, 11.06s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▍ | 335/2230 [1:02:02<5:49:16, 11.06s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5496, 'learning_rate': 0.0001992, 'epoch': 0.75} + 15%|███████████▍ | 335/2230 [1:02:02<5:49:16, 11.06s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▍ | 335/2230 [1:02:02<5:49:16, 11.06s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:47,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:47,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:47,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4278, 'learning_rate': 0.0001998, 'epoch': 0.75} +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:53,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:53,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:53,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:53,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:53,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4312, 'learning_rate': 0.0002004, 'epoch': 0.76} +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:53,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:53,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:53,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:34:53,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:12,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:12,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4803, 'learning_rate': 0.000201, 'epoch': 0.76} +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:12,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:18,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:18,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 339/2230 [1:02:45<5:36:11, 10.67s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▌ | 339/2230 [1:02:45<5:36:11, 10.67s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:24,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:24,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:35:28,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:35:30,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:35:30,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5558, 'learning_rate': 0.0002022, 'epoch': 0.76} +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:34,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:36,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:39,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:39,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:39,069 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:35:43,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:35:45,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:35:47,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▋ | 342/2230 [1:03:12<4:56:41, 9.43s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▋ | 342/2230 [1:03:12<4:56:41, 9.43s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:50,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:52,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:54,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:56,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:56,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:58,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:35:59,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:03,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:03,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:04,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:06,442 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:07,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:11,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:11,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:12,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:15,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:15,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:17,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:19,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:21,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:21,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:22,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:24,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:24,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:27,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:29,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:29,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:30,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:32,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:32,819 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0596, 'learning_rate': 0.00020819999999999996, 'epoch': 0.78} +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:36,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:36,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:40,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:43,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:43,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:43,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:47,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:47,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:50,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:50,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:54,088 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:57,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:57,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:36:57,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:01,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:01,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:04,572 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:08,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:08,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:11,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:11,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:11,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:14,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:18,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:18,185 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6417, 'learning_rate': 0.00021059999999999997, 'epoch': 0.79} +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7513, 'learning_rate': 0.00021119999999999996, 'epoch': 0.8} +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7933, 'learning_rate': 0.00021179999999999997, 'epoch': 0.8} +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6334, 'learning_rate': 0.00021239999999999996, 'epoch': 0.8} +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6693, 'learning_rate': 0.00021299999999999997, 'epoch': 0.8} +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5336, 'learning_rate': 0.00021359999999999996, 'epoch': 0.8} +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5268, 'learning_rate': 0.00021419999999999998, 'epoch': 0.81} +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:37:21,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4647, 'learning_rate': 0.00021479999999999996, 'epoch': 0.81} + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4749, 'learning_rate': 0.00021539999999999998, 'epoch': 0.81} + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4175, 'learning_rate': 0.00021599999999999996, 'epoch': 0.81} + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4526, 'learning_rate': 0.00021659999999999998, 'epoch': 0.82} + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|██���█████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5824, 'learning_rate': 0.00021719999999999997, 'epoch': 0.82} + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3921, 'learning_rate': 0.00021779999999999998, 'epoch': 0.82} + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5027, 'learning_rate': 0.00021839999999999997, 'epoch': 0.82} + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4884, 'learning_rate': 0.00021899999999999998, 'epoch': 0.83} + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4069, 'learning_rate': 0.00021959999999999997, 'epoch': 0.83} + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3835, 'learning_rate': 0.00022019999999999999, 'epoch': 0.83} + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|████████████▎ | 361/2230 [1:06:24<6:43:39, 12.96s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4292, 'learning_rate': 0.0002214, 'epoch': 0.83} + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3984, 'learning_rate': 0.00022199999999999998, 'epoch': 0.84} + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2254, 'learning_rate': 0.0002226, 'epoch': 0.84} + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4513, 'learning_rate': 0.00022319999999999998, 'epoch': 0.84} + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▋ | 371/2230 [1:08:32<6:27:24, 12.50s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3402, 'learning_rate': 0.0002238, 'epoch': 0.84} +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4062, 'learning_rate': 0.00022439999999999998, 'epoch': 0.85} +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3849, 'learning_rate': 0.000225, 'epoch': 0.85} +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:42:10,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▉ | 379/2230 [1:10:08<6:05:46, 11.86s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|████████████▉ | 379/2230 [1:10:08<6:05:46, 11.86s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2675, 'learning_rate': 0.00022559999999999998, 'epoch': 0.85} + 17%|████████████▉ | 379/2230 [1:10:08<6:05:46, 11.86s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:42:52,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:42:52,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:42:52,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:42:52,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:42:52,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3187, 'learning_rate': 0.00022619999999999997, 'epoch': 0.85} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:42:52,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:42:52,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:43:06,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:43:06,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:43:06,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3882, 'learning_rate': 0.00022679999999999998, 'epoch': 0.85} +[WARNING|modeling_utils.py:388] 2022-03-22 17:43:06,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:43:06,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:43:16,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:43:16,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:43:16,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:43:16,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.426, 'learning_rate': 0.00022739999999999997, 'epoch': 0.86} +[WARNING|modeling_utils.py:388] 2022-03-22 17:43:16,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:43:16,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:43:16,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 383/2230 [1:10:53<5:48:45, 11.33s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 383/2230 [1:10:53<5:48:45, 11.33s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3767, 'learning_rate': 0.00022799999999999999, 'epoch': 0.86} + 17%|█████████████ | 383/2230 [1:10:53<5:48:45, 11.33s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:43:36,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:43:36,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 384/2230 [1:11:04<5:43:15, 11.16s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 384/2230 [1:11:04<5:43:15, 11.16s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3028, 'learning_rate': 0.00022859999999999997, 'epoch': 0.86} + 17%|█████████████ | 384/2230 [1:11:04<5:43:15, 11.16s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 384/2230 [1:11:04<5:43:15, 11.16s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 384/2230 [1:11:04<5:43:15, 11.16s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 384/2230 [1:11:04<5:43:15, 11.16s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████ | 384/2230 [1:11:04<5:43:15, 11.16s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:43:53,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:43:53,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:43:53,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:43:53,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:01,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:01,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2933, 'learning_rate': 0.00022979999999999997, 'epoch': 0.87} +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:01,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:01,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:01,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:11,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:11,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4037, 'learning_rate': 0.0002304, 'epoch': 0.87} +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:11,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:11,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:11,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:11,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:11,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 17%|█████████████▏ | 388/2230 [1:11:46<5:33:03, 10.85s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:25,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:25,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:44:30,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:44:30,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:44:30,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:34,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:34,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:44:38,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:44:38,430 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:42,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:42,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:44,516 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:46,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:46,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:44:50,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:44:50,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:44:52,811 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:44:52,811 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:56,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:58,379 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:44:58,379 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:00,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:02,390 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:04,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:06,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:06,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:08,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:10,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:11,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:13,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:13,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:15,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:18,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:20,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:20,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:21,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:24,794 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:26,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:26,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:28,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:30,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:30,032 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:32,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:34,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:36,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:36,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:38,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:41,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:41,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:42,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:42,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:42,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:46,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:46,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:50,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:50,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:54,224 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:54,224 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:57,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:45:57,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:01,279 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:01,279 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:04,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:04,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:08,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:11,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:11,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:11,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:15,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:15,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:18,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:22,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:22,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:25,397 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:25,397 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7643, 'learning_rate': 0.00023999999999999998, 'epoch': 0.9} +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:28,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:32,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:32,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:32,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:46:32,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 404/2230 [1:14:04<5:56:20, 11.71s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 404/2230 [1:14:04<5:56:20, 11.71s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5921, 'learning_rate': 0.0002406, 'epoch': 0.91} + 18%|█████████████▊ | 404/2230 [1:14:04<5:56:20, 11.71s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 404/2230 [1:14:04<5:56:20, 11.71s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 404/2230 [1:14:04<5:56:20, 11.71s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 404/2230 [1:14:04<5:56:20, 11.71s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 404/2230 [1:14:04<5:56:20, 11.71s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 404/2230 [1:14:04<5:56:20, 11.71s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5268, 'learning_rate': 0.00024119999999999998, 'epoch': 0.91} + 18%|█████████████▊ | 404/2230 [1:14:04<5:56:20, 11.71s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 404/2230 [1:14:04<5:56:20, 11.71s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 404/2230 [1:14:04<5:56:20, 11.71s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 404/2230 [1:14:04<5:56:20, 11.71s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|██████��██████▊ | 404/2230 [1:14:04<5:56:20, 11.71s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 406/2230 [1:14:30<6:22:31, 12.58s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 406/2230 [1:14:30<6:22:31, 12.58s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5454, 'learning_rate': 0.0002418, 'epoch': 0.91} + 18%|█████████████▊ | 406/2230 [1:14:30<6:22:31, 12.58s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 406/2230 [1:14:30<6:22:31, 12.58s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 406/2230 [1:14:30<6:22:31, 12.58s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 406/2230 [1:14:30<6:22:31, 12.58s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 406/2230 [1:14:30<6:22:31, 12.58s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 406/2230 [1:14:30<6:22:31, 12.58s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5719, 'learning_rate': 0.00024239999999999998, 'epoch': 0.91} + 18%|█████████████▊ | 406/2230 [1:14:30<6:22:31, 12.58s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 406/2230 [1:14:30<6:22:31, 12.58s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 406/2230 [1:14:30<6:22:31, 12.58s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 406/2230 [1:14:30<6:22:31, 12.58s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 406/2230 [1:14:30<6:22:31, 12.58s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5135, 'learning_rate': 0.000243, 'epoch': 0.91} + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4509, 'learning_rate': 0.00024359999999999999, 'epoch': 0.92} + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.395, 'learning_rate': 0.00024419999999999997, 'epoch': 0.92} + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5676, 'learning_rate': 0.0002448, 'epoch': 0.92} + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3981, 'learning_rate': 0.00024539999999999995, 'epoch': 0.92} + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3338, 'learning_rate': 0.00024599999999999996, 'epoch': 0.93} + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4346, 'learning_rate': 0.0002466, 'epoch': 0.93} + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4811, 'learning_rate': 0.0002472, 'epoch': 0.93} + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3513, 'learning_rate': 0.00024779999999999995, 'epoch': 0.93} + 18%|██████���██████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3441, 'learning_rate': 0.00024839999999999997, 'epoch': 0.93} + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 408/2230 [1:14:57<6:29:54, 12.84s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▏ | 418/2230 [1:17:05<6:20:17, 12.59s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▏ | 418/2230 [1:17:05<6:20:17, 12.59s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3528, 'learning_rate': 0.00024959999999999994, 'epoch': 0.94} +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2688, 'learning_rate': 0.00025019999999999996, 'epoch': 0.94} +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3758, 'learning_rate': 0.00025079999999999997, 'epoch': 0.94} +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2928, 'learning_rate': 0.0002514, 'epoch': 0.95} +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3609, 'learning_rate': 0.00025199999999999995, 'epoch': 0.95} +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:49:46,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:50:52,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:50:52,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:50:52,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4505, 'learning_rate': 0.00025259999999999996, 'epoch': 0.95} +[WARNING|modeling_utils.py:388] 2022-03-22 17:50:52,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:50:52,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:50:52,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:50:52,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:50:52,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3456, 'learning_rate': 0.0002532, 'epoch': 0.95} +[WARNING|modeling_utils.py:388] 2022-03-22 17:50:52,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:50:52,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:50:52,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:50:52,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 426/2230 [1:18:42<6:04:08, 12.11s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 426/2230 [1:18:42<6:04:08, 12.11s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.407, 'learning_rate': 0.0002538, 'epoch': 0.96} + 19%|██████████████▌ | 426/2230 [1:18:42<6:04:08, 12.11s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 426/2230 [1:18:42<6:04:08, 12.11s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:51:27,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:51:27,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:51:27,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2921, 'learning_rate': 0.00025439999999999995, 'epoch': 0.96} +[WARNING|modeling_utils.py:388] 2022-03-22 17:51:27,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:51:27,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:51:27,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:51:27,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 428/2230 [1:19:04<5:51:10, 11.69s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 428/2230 [1:19:04<5:51:10, 11.69s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2807, 'learning_rate': 0.00025499999999999996, 'epoch': 0.96} + 19%|██████████████▌ | 428/2230 [1:19:04<5:51:10, 11.69s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 428/2230 [1:19:04<5:51:10, 11.69s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▌ | 428/2230 [1:19:04<5:51:10, 11.69s/it]g-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:51:52,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:51:52,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:51:52,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1697, 'learning_rate': 0.0002556, 'epoch': 0.96} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:51:52,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:51:52,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:51:52,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▋ | 430/2230 [1:19:27<5:42:26, 11.41s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▋ | 430/2230 [1:19:27<5:42:26, 11.41s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3221, 'learning_rate': 0.0002562, 'epoch': 0.96} + 19%|██████████████▋ | 430/2230 [1:19:27<5:42:26, 11.41s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████▋ | 430/2230 [1:19:27<5:42:26, 11.41s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:11,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:11,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:11,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:11,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2951, 'learning_rate': 0.00025679999999999995, 'epoch': 0.97} +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:11,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:22,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:22,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:22,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:22,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4834, 'learning_rate': 0.00025739999999999997, 'epoch': 0.97} +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:22,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:22,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:52:34,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:52:34,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3382, 'learning_rate': 0.000258, 'epoch': 0.97} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:52:34,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:52:34,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:42,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:42,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:42,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2568, 'learning_rate': 0.0002586, 'epoch': 0.97} +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:48,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:52:53,220 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|██████████████▊ | 435/2230 [1:20:18<5:07:11, 10.27s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|██████████████▊ | 435/2230 [1:20:18<5:07:11, 10.27s/it] Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:57,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:52:57,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:53:01,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:53:01,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:53:01,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:05,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:05,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:53:09,213 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:53:11,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:17:06,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|██████████████▉ | 437/2230 [1:20:36<4:46:01, 9.57s/it][WARNING|modeling_bart.py:1051] 2022-03-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|██████████████▉ | 437/2230 [1:20:36<4:46:01, 9.57s/it][WARNING|modeling_bart.py:1051] 2022-03-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:53:15,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:53:15,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:53:15,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:21,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:21,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:23,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:25,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:27,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:29,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:29,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:31,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:32,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:34,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:34,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:37,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:39,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:41,073 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:42,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:42,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:45,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:48,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:48,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:49,351 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:51,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:54,047 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:54,047 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:56,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:58,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:58,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:59,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:53:59,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:01,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:03,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:03,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5722, 'learning_rate': 0.00026579999999999996, 'epoch': 1.0} +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:06,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:09,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:09,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:13,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:13,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:17,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:17,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.523, 'learning_rate': 0.00026639999999999997, 'epoch': 1.0} +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:20,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:24,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:24,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:27,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:27,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:30,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:30,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:34,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:34,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:37,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:41,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:41,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:44,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:44,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:44,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:48,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:51,501 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:51,501 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:54,968 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:54,968 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6654, 'learning_rate': 0.00026819999999999996, 'epoch': 1.01} +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5437, 'learning_rate': 0.0002688, 'epoch': 1.01} +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4698, 'learning_rate': 0.0002694, 'epoch': 1.01} +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 17:54:58,327 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 453/2230 [1:23:05<6:23:22, 12.94s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 453/2230 [1:23:05<6:23:22, 12.94s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 453/2230 [1:23:05<6:23:22, 12.94s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 453/2230 [1:23:05<6:23:22, 12.94s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 453/2230 [1:23:05<6:23:22, 12.94s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 453/2230 [1:23:05<6:23:22, 12.94s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 453/2230 [1:23:05<6:23:22, 12.94s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 453/2230 [1:23:05<6:23:22, 12.94s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3267, 'learning_rate': 0.00027059999999999996, 'epoch': 1.02} + 20%|███████████████▍ | 453/2230 [1:23:05<6:23:22, 12.94s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 453/2230 [1:23:05<6:23:22, 12.94s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 453/2230 [1:23:05<6:23:22, 12.94s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 453/2230 [1:23:05<6:23:22, 12.94s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▍ | 453/2230 [1:23:05<6:23:22, 12.94s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2144, 'learning_rate': 0.0002712, 'epoch': 1.02} + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2471, 'learning_rate': 0.0002718, 'epoch': 1.02} + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0365, 'learning_rate': 0.0002724, 'epoch': 1.02} + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 20%|███████████████▌ | 455/2230 [1:23:31<6:27:00, 13.08s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▌ | 458/2230 [1:24:10<6:22:21, 12.95s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▌ | 458/2230 [1:24:10<6:22:21, 12.95s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9685, 'learning_rate': 0.00027299999999999997, 'epoch': 1.03} + 21%|███████████████▌ | 458/2230 [1:24:10<6:22:21, 12.95s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▌ | 458/2230 [1:24:10<6:22:21, 12.95s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▌ | 458/2230 [1:24:10<6:22:21, 12.95s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▌ | 458/2230 [1:24:10<6:22:21, 12.95s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1301, 'learning_rate': 0.0002736, 'epoch': 1.03} + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.063, 'learning_rate': 0.0002742, 'epoch': 1.03} + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.067, 'learning_rate': 0.0002748, 'epoch': 1.03} + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9844, 'learning_rate': 0.00027539999999999997, 'epoch': 1.04} + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0705, 'learning_rate': 0.000276, 'epoch': 1.04} + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9171, 'learning_rate': 0.0002766, 'epoch': 1.04} + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9141, 'learning_rate': 0.0002772, 'epoch': 1.04} + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|██████████████��▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9032, 'learning_rate': 0.0002778, 'epoch': 1.04} + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0287, 'learning_rate': 0.0002784, 'epoch': 1.05} + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8513, 'learning_rate': 0.000279, 'epoch': 1.05} + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8698, 'learning_rate': 0.00027959999999999997, 'epoch': 1.05} + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▋ | 459/2230 [1:24:23<6:21:39, 12.93s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0594, 'learning_rate': 0.0002802, 'epoch': 1.05} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:59:21,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:59:21,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:59:21,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:59:21,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:59:21,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:59:21,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9681, 'learning_rate': 0.0002808, 'epoch': 1.06} +[WARNING|modeling_bart.py:1051] 2022-03-22 17:59:21,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:59:21,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:59:21,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 17:59:21,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 472/2230 [1:27:04<5:50:22, 11.96s/it] Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 472/2230 [1:27:04<5:50:22, 11.96s/it] Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9133, 'learning_rate': 0.00028139999999999996, 'epoch': 1.06} + 21%|████████████████ | 472/2230 [1:27:04<5:50:22, 11.96s/it] Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 472/2230 [1:27:04<5:50:22, 11.96s/it] Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 472/2230 [1:27:04<5:50:22, 11.96s/it] Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 472/2230 [1:27:04<5:50:22, 11.96s/it] Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 472/2230 [1:27:04<5:50:22, 11.96s/it] Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8527, 'learning_rate': 0.00028199999999999997, 'epoch': 1.06} + 21%|████████████████ | 472/2230 [1:27:04<5:50:22, 11.96s/it] Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 472/2230 [1:27:04<5:50:22, 11.96s/it] Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 472/2230 [1:27:04<5:50:22, 11.96s/it] Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 472/2230 [1:27:04<5:50:22, 11.96s/it] Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 474/2230 [1:27:27<5:44:04, 11.76s/it] Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 474/2230 [1:27:27<5:44:04, 11.76s/it] Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7028, 'learning_rate': 0.0002826, 'epoch': 1.06} +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:08,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:08,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:08,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:08,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:08,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:08,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:08,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.678, 'learning_rate': 0.00028319999999999994, 'epoch': 1.07} +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:08,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:08,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:08,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 476/2230 [1:27:52<5:49:56, 11.97s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 476/2230 [1:27:52<5:49:56, 11.97s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8978, 'learning_rate': 0.00028379999999999996, 'epoch': 1.07} + 21%|████████████████▏ | 476/2230 [1:27:52<5:49:56, 11.97s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 476/2230 [1:27:52<5:49:56, 11.97s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 476/2230 [1:27:52<5:49:56, 11.97s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 476/2230 [1:27:52<5:49:56, 11.97s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▏ | 476/2230 [1:27:52<5:49:56, 11.97s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7861, 'learning_rate': 0.0002844, 'epoch': 1.07} + 21%|████████████████▏ | 476/2230 [1:27:52<5:49:56, 11.97s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:45,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:45,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:45,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:45,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▎ | 478/2230 [1:28:14<5:37:01, 11.54s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:54,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:54,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:54,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:00:54,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▎ | 479/2230 [1:28:25<5:31:27, 11.36s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▎ | 479/2230 [1:28:25<5:31:27, 11.36s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8174, 'learning_rate': 0.00028559999999999995, 'epoch': 1.07} + 21%|████████████████▎ | 479/2230 [1:28:25<5:31:27, 11.36s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▎ | 479/2230 [1:28:25<5:31:27, 11.36s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████▎ | 479/2230 [1:28:25<5:31:27, 11.36s/it]g-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:12,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:12,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8163, 'learning_rate': 0.00028619999999999996, 'epoch': 1.08} +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:12,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:12,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:12,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:01:22,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:01:22,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9948, 'learning_rate': 0.0002868, 'epoch': 1.08} +[WARNING|modeling_bart.py:1051] 2022-03-22 18:01:22,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:28,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:28,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:28,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:28,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.752, 'learning_rate': 0.00028739999999999994, 'epoch': 1.08} +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:28,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:38,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:38,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:38,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:38,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:45,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:45,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:45,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:50,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:50,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:50,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7391, 'learning_rate': 0.00028859999999999997, 'epoch': 1.09} +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:56,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:59,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:01:59,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 17:53:13,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 485/2230 [1:29:26<4:51:42, 10.03s/it][WARNING|modeling_bart.py:1051] 2022-03-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 485/2230 [1:29:26<4:51:42, 10.03s/it][WARNING|modeling_bart.py:1051] 2022-03-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7673, 'learning_rate': 0.0002892, 'epoch': 1.09} +[WARNING|modeling_utils.py:388] 2022-03-22 18:02:07,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:02:09,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:02:11,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-22 18:02:11,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6326, 'learning_rate': 0.00028979999999999994, 'epoch': 1.09} +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:15,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:17,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:20,020 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:20,020 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:22,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:24,220 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:28,206 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:30,309 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:30,309 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:32,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:34,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:36,442 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:38,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:38,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:40,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:42,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:43,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:43,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:45,721 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:47,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:49,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:52,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:52,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:53,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:56,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:56,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:02:58,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:00,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:01,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:01,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:04,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:06,438 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:06,438 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:08,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:10,365 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:10,365 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:13,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:14,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:14,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.8942, 'learning_rate': 0.0002958, 'epoch': 1.11} +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:18,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:18,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:21,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:21,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:25,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:29,354 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:29,354 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.571, 'learning_rate': 0.0002964, 'epoch': 1.11} +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:33,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:33,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:36,609 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:36,609 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:40,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:40,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:40,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:44,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:44,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:47,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:47,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:51,536 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:55,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:55,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:58,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:58,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.784, 'learning_rate': 0.00029759999999999997, 'epoch': 1.12} +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:58,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:58,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:58,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:58,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:58,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-22 18:03:58,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4678, 'learning_rate': 0.0002982, 'epoch': 1.12} +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +03/22/2022 18:14:18 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow +{'eval_loss': 4.352957248687744, 'eval_wer': 1.7716977389924633, 'eval_runtime': 602.6395, 'eval_samples_per_second': 4.384, 'eval_steps_per_second': 0.549, 'epoch': 1.12} +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-22 18:04:15,945 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-22 18:02:03,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +03/22/2022 18:14:37 - WARNING - huggingface_hub.repository - Adding files tracked by Git LFS: ['wandb/run-20220322_163235-2yj5gh94/run-2yj5gh94.wandb']. This may take a bit of time if the files are large.