diff --git "a/wandb/run-20220328_170142-by95ehra/files/output.log" "b/wandb/run-20220328_170142-by95ehra/files/output.log" new file mode 100644--- /dev/null +++ "b/wandb/run-20220328_170142-by95ehra/files/output.log" @@ -0,0 +1,6371 @@ + + 0%| | 0/1110 [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:01:45,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:01:47,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:01:47,859 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:01:49,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:01:49,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:01:51,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:01:51,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:01:53,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:01:53,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:01:54,995 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:01:55,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:01:56,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:01:57,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:01:58,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:01:59,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:00,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:01,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:02,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:03,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:04,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:05,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:06,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:07,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:09,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:09,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:11,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:11,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:12,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:13,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 10.1809, 'learning_rate': 0.0, 'epoch': 0.01} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:14,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:15,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%| | 1/1110 [00:31<9:51:22, 32.00s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:02:16,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:17,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:18,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:19,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:20,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:21,160 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:22,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:22,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:24,242 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:24,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:26,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:26,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:27,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:28,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:29,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:30,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:31,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:32,323 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:33,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:34,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:35,322 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:35,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:37,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:37,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:38,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:39,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:40,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:41,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:42,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:43,241 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:44,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 10.3677, 'learning_rate': 6e-07, 'epoch': 0.02} +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:45,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 2/1110 [01:02<9:29:50, 30.86s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:02:46,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:47,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:48,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:49,279 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:50,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:51,064 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:52,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:52,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:54,030 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:54,655 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:55,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:56,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:57,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:02:58,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:02:59,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:00,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:01,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:01,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:03,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:03,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:04,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:05,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:06,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:07,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:08,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:09,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:10,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:10,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:12,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:12,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:13,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:14,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 3/1110 [01:30<9:12:44, 29.96s/it] + 0%|▏ | 3/1110 [01:30<9:12:44, 29.96s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:03:15,671 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:16,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:17,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:18,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:19,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:19,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:20,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:21,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:22,770 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:23,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:24,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:25,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:26,276 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:26,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:28,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:28,685 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:29,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:30,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:31,670 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:32,268 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:33,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:34,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:35,206 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:35,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:36,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:37,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:38,729 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:39,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:40,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:41,105 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:42,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:42,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 4/1110 [01:59<9:01:46, 29.39s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:03:44,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 10.153, 'learning_rate': 1.2e-06, 'epoch': 0.04} +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:44,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:45,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:46,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:47,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:48,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:49,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:50,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:51,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:51,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:52,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:53,501 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:54,632 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:55,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:56,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:56,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:58,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:03:58,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:03:59,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:00,511 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:01,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:02,248 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:03,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:03,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:05,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:05,727 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:06,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:07,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:08,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:09,196 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:10,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 10.1314, 'learning_rate': 1.8e-06, 'epoch': 0.04} +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:10,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 5/1110 [02:27<8:52:18, 28.90s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:04:12,206 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:12,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:13,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:14,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:15,636 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:16,204 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:17,323 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:17,916 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:19,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:19,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:20,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:21,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:22,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:23,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:24,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:24,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:25,896 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:26,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:27,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:28,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:29,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:29,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:31,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:31,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:32,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:33,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:34,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:35,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:36,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:36,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:37,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:38,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 6/1110 [02:55<8:43:03, 28.43s/it] + 1%|▍ | 6/1110 [02:55<8:43:03, 28.43s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:04:39,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:40,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:41,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:41,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:43,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:43,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:44,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:47,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:48,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:48,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:50,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:50,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:51,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:52,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:53,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:53,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:55,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:55,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:56,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:57,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:04:58,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:04:59,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:00,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:00,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:01,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:02,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:03,523 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:04,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:05,210 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:05,788 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:06,909 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:07,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 7/1110 [03:24<8:46:25, 28.64s/it] + 1%|▌ | 7/1110 [03:24<8:46:25, 28.64s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:05:08,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:09,373 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:10,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:11,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:12,101 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:12,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:13,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:14,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:15,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:16,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:17,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:17,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:18,741 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:19,338 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:20,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:20,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:22,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:22,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:23,741 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:24,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:25,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:25,945 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:27,033 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:27,616 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:28,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:29,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:30,351 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:30,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:31,986 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:32,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.0158, 'learning_rate': 3.6e-06, 'epoch': 0.07} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:33,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:34,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 8/1110 [03:50<8:34:52, 28.03s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:05:35,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:36,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:37,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:37,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:38,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:39,338 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:40,401 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:40,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:42,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:42,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:43,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:44,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:45,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:45,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:47,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:47,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:48,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:49,280 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:50,335 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:50,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:51,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:52,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:53,598 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:54,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:55,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:55,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:56,900 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:57,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:05:58,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:05:59,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:00,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:00,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 9/1110 [04:17<8:25:28, 27.55s/it] + 1%|▋ | 9/1110 [04:17<8:25:28, 27.55s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:06:01,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:02,478 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:03,514 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:04,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:05,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:05,661 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:06,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:07,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:08,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:08,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:09,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:10,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:11,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:12,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:13,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:13,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:14,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:15,431 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:16,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:17,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:18,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:18,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:19,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:20,263 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:21,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:21,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:22,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:23,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:24,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:25,144 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:26,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 8.2283, 'learning_rate': 4.8e-06, 'epoch': 0.09} +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:26,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 10/1110 [04:43<8:16:25, 27.08s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:06:27,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:28,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:29,488 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:30,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:31,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:31,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:32,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:33,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:34,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:34,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:35,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:36,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:37,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:38,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:39,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:39,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:40,719 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:41,390 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:42,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:43,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:44,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:44,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:45,626 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:46,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:47,206 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:47,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:48,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:49,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:50,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:50,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:51,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:52,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 11/1110 [05:09<8:08:39, 26.68s/it] + 1%|▊ | 11/1110 [05:09<8:08:39, 26.68s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:06:54,256 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:56,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:06:56,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:00,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:03,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:03,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:06,301 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:06,301 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:09,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:12,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:12,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:15,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 12/1110 [05:34<8:00:37, 26.26s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▊ | 12/1110 [05:34<8:00:37, 26.26s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 7.5517, 'learning_rate': 5.999999999999999e-06, 'epoch': 0.11} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:22,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:25,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:25,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:28,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:33,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:33,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:36,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:39,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:39,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:42,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▉ | 13/1110 [06:01<8:03:48, 26.46s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▉ | 13/1110 [06:01<8:03:48, 26.46s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 7.3315, 'learning_rate': 6.599999999999999e-06, 'epoch': 0.12} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:49,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:52,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:52,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:55,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:58,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:07:58,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:01,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:04,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:04,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:07,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:07,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▉ | 14/1110 [06:25<7:52:43, 25.88s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|▉ | 14/1110 [06:25<7:52:43, 25.88s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:13,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:16,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:16,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:19,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:22,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:25,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:25,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:28,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:31,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 15/1110 [06:49<7:41:31, 25.29s/it] Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 15/1110 [06:49<7:41:31, 25.29s/it] Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█ | 15/1110 [06:49<7:41:31, 25.29s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:37,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:37,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:40,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:43,178 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:46,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:46,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:49,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:49,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:51,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:08:54,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█▏ | 16/1110 [07:13<7:32:20, 24.81s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 1%|█▏ | 16/1110 [07:13<7:32:20, 24.81s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.7689, 'learning_rate': 8.4e-06, 'epoch': 0.14} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:00,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:03,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:03,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:06,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:09,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:12,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:12,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:15,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:18,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1110 [07:36<7:22:21, 24.28s/it] Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1110 [07:36<7:22:21, 24.28s/it] Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▏ | 17/1110 [07:36<7:22:21, 24.28s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:23,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:26,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:26,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:29,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:32,034 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.3901, 'learning_rate': 9.6e-06, 'epoch': 0.16} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▎ | 19/1110 [08:21<7:05:40, 23.41s/it] Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 2%|█▎ | 19/1110 [08:21<7:05:40, 23.41s/it] Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.3045, 'learning_rate': 1.02e-05, 'epoch': 0.17} + 2%|█▎ | 19/1110 [08:21<7:05:40, 23.41s/it] Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:10:11,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:10:11,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:10:15,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:10:15,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:10:19,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:10:19,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:10:24,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:10:24,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.1671, 'learning_rate': 1.0799999999999998e-05, 'epoch': 0.18} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:10:24,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:10:30,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:10:32,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:10:32,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:10:36,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:10:38,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:10:40,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:10:42,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:10:42,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.9445, 'learning_rate': 1.14e-05, 'epoch': 0.19} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:10:46,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:10:48,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:10:50,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:10:52,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:10:53,928 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:10:55,767 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:10:55,767 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:10:57,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:00,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:02,628 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:04,189 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:07,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:08,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:11,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:11,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:12,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:14,995 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:17,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:19,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:19,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:21,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:23,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:25,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:26,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:26,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.7463, 'learning_rate': 1.3799999999999998e-05, 'epoch': 0.22} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:31,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:31,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:31,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:35,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:39,304 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:39,304 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:42,989 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:42,989 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:46,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:46,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:50,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:53,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:53,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:57,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:11:57,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.6185, 'learning_rate': 1.44e-05, 'epoch': 0.23} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:01,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:01,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:04,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:04,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:08,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.5114, 'learning_rate': 1.4999999999999999e-05, 'epoch': 0.24} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.3849, 'learning_rate': 1.5599999999999996e-05, 'epoch': 0.25} + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.3192, 'learning_rate': 1.6199999999999997e-05, 'epoch': 0.26} + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2528, 'learning_rate': 1.68e-05, 'epoch': 0.27} + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1951, 'learning_rate': 1.74e-05, 'epoch': 0.28} + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1437, 'learning_rate': 1.7999999999999997e-05, 'epoch': 0.29} + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1332, 'learning_rate': 1.8599999999999998e-05, 'epoch': 0.3} + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.082, 'learning_rate': 1.92e-05, 'epoch': 0.3} + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9941, 'learning_rate': 1.98e-05, 'epoch': 0.31} + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.977, 'learning_rate': 2.04e-05, 'epoch': 0.32} + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9351, 'learning_rate': 2.1e-05, 'epoch': 0.33} + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9614, 'learning_rate': 2.1599999999999996e-05, 'epoch': 0.34} + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8991, 'learning_rate': 2.2199999999999998e-05, 'epoch': 0.35} + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|██▊ | 40/1110 [16:25<7:22:29, 24.81s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|██▊ | 40/1110 [16:25<7:22:29, 24.81s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9255, 'learning_rate': 2.28e-05, 'epoch': 0.36} +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:13,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:13,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:13,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:13,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:13,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:13,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:13,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8665, 'learning_rate': 2.34e-05, 'epoch': 0.37} +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8518, 'learning_rate': 2.3999999999999997e-05, 'epoch': 0.38} +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:11,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:11,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:11,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|███ | 43/1110 [17:32<6:49:55, 23.05s/it]g-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|███ | 43/1110 [17:32<6:49:55, 23.05s/it]g-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:19,064 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:19,064 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:22,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:22,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:22,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:29,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:29,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:29,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:29,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:29,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:29,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:29,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8202, 'learning_rate': 2.52e-05, 'epoch': 0.39} +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:43,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:43,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:43,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:49,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:19:49,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:19:53,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:19:55,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|███▏ | 45/1110 [18:14<6:24:29, 21.66s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 4%|███▏ | 45/1110 [18:14<6:24:29, 21.66s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9285, 'learning_rate': 2.5799999999999997e-05, 'epoch': 0.4} +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:02,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:04,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:06,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:08,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:10,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:12,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:14,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:14,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:16,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:18,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:19,964 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:21,755 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:23,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:25,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:28,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:28,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:30,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:31,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:34,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:36,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:38,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:38,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:40,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:42,723 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:44,997 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:46,068 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:48,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:48,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:50,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:51,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:54,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:55,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:55,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:57,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:20:57,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:01,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:01,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:05,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:08,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:08,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:12,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:12,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:16,352 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:16,352 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:19,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:19,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:23,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:27,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:27,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2754, 'learning_rate': 2.94e-05, 'epoch': 0.46} +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:30,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:30,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:34,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:34,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:37,927 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:41,390 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:41,390 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:44,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:44,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:44,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:44,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:44,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:44,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:44,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.3344, 'learning_rate': 2.9999999999999997e-05, 'epoch': 0.47} +[WARNING|modeling_utils.py:388] 2022-03-28 17:21:44,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2162, 'learning_rate': 3.06e-05, 'epoch': 0.48} +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.0826, 'learning_rate': 3.119999999999999e-05, 'epoch': 0.48} +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8626, 'learning_rate': 3.1799999999999994e-05, 'epoch': 0.49} +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8319, 'learning_rate': 3.2399999999999995e-05, 'epoch': 0.5} + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.771, 'learning_rate': 3.2999999999999996e-05, 'epoch': 0.51} + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6895, 'learning_rate': 3.36e-05, 'epoch': 0.52} + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.693, 'learning_rate': 3.42e-05, 'epoch': 0.53} + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7241, 'learning_rate': 3.48e-05, 'epoch': 0.54} + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7604, 'learning_rate': 3.539999999999999e-05, 'epoch': 0.55} + 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6755, 'learning_rate': 3.5999999999999994e-05, 'epoch': 0.56} + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6489, 'learning_rate': 3.6599999999999995e-05, 'epoch': 0.57} + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.658, 'learning_rate': 3.7199999999999996e-05, 'epoch': 0.57} + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6951, 'learning_rate': 3.78e-05, 'epoch': 0.58} + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:27:50,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:27:50,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5903, 'learning_rate': 3.84e-05, 'epoch': 0.59} +[WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:15,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:15,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:15,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:15,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6142, 'learning_rate': 3.9e-05, 'epoch': 0.6} +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:23,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:23,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:27,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:27,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:27,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:27,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:27,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:27,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1110 [26:56<6:35:13, 22.76s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▊ | 68/1110 [26:56<6:35:13, 22.76s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6912, 'learning_rate': 3.96e-05, 'epoch': 0.61} + 6%|████▊ | 68/1110 [26:56<6:35:13, 22.76s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:46,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:46,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:46,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:52,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:52,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:28:52,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:29:00,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:29:00,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:29:00,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6049, 'learning_rate': 4.02e-05, 'epoch': 0.62} +[WARNING|modeling_utils.py:388] 2022-03-28 17:29:00,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:08,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:08,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:08,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:14,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:16,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:18,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:18,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 6%|████▉ | 70/1110 [27:37<6:12:32, 21.49s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:29:22,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:29:25,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:29:25,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:29,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:31,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:33,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:35,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:39,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:41,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:43,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:45,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:46,870 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:48,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:50,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:50,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:53,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:55,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:29:57,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:00,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:01,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:04,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:04,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:05,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:07,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:09,401 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:11,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:13,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:13,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:15,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:17,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:19,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:20,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:20,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5642, 'learning_rate': 4.3799999999999994e-05, 'epoch': 0.67} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:25,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:25,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:29,481 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:29,481 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:33,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:33,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:36,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:40,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:40,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:44,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:44,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:47,620 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:51,213 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:51,213 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:51,213 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:54,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:54,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:58,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:30:58,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:01,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:01,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:05,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1318, 'learning_rate': 4.4999999999999996e-05, 'epoch': 0.69} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1167, 'learning_rate': 4.56e-05, 'epoch': 0.7} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.0646, 'learning_rate': 4.62e-05, 'epoch': 0.71} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8794, 'learning_rate': 4.68e-05, 'epoch': 0.72} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6735, 'learning_rate': 4.7399999999999993e-05, 'epoch': 0.73} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6389, 'learning_rate': 4.7999999999999994e-05, 'epoch': 0.74} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6545, 'learning_rate': 4.8599999999999995e-05, 'epoch': 0.74} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6443, 'learning_rate': 4.9199999999999997e-05, 'epoch': 0.75} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6087, 'learning_rate': 4.98e-05, 'epoch': 0.76} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5696, 'learning_rate': 5.04e-05, 'epoch': 0.77} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5662, 'learning_rate': 5.1e-05, 'epoch': 0.78} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6608, 'learning_rate': 5.1599999999999994e-05, 'epoch': 0.79} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5441, 'learning_rate': 5.2199999999999995e-05, 'epoch': 0.8} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6029, 'learning_rate': 5.279999999999999e-05, 'epoch': 0.81} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4904, 'learning_rate': 5.339999999999999e-05, 'epoch': 0.82} +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:31,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:31,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:31,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:31,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:31,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▌ | 92/1110 [35:56<6:34:13, 23.24s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▌ | 92/1110 [35:56<6:34:13, 23.24s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▌ | 92/1110 [35:56<6:34:13, 23.24s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▌ | 92/1110 [35:56<6:34:13, 23.24s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▌ | 92/1110 [35:56<6:34:13, 23.24s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 8%|██████▌ | 92/1110 [35:56<6:34:13, 23.24s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:51,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:51,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:51,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:51,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:51,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:37:51,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5483, 'learning_rate': 5.459999999999999e-05, 'epoch': 0.83} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:38:04,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:38:04,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:38:04,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:38:04,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:38:12,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:38:12,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:38:12,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:38:12,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:38:12,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:38:22,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:38:22,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5813, 'learning_rate': 5.519999999999999e-05, 'epoch': 0.84} +[WARNING|modeling_utils.py:388] 2022-03-28 17:38:22,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:38:28,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:38:28,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:38:33,094 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:38:35,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:38:35,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:38:39,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:38:41,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:38:41,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5633, 'learning_rate': 5.5799999999999994e-05, 'epoch': 0.85} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:38:45,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:38:47,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:38:49,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:38:51,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:38:53,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:38:55,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:38:57,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:38:57,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:38:59,322 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:01,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:02,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:04,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:08,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:09,670 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:11,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:11,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:12,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:14,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:17,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:20,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:21,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:21,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:23,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:25,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:27,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:29,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:31,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:31,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:33,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:36,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:37,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:38,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:38,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.492, 'learning_rate': 5.88e-05, 'epoch': 0.9} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:43,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:43,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:46,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:50,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:50,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:54,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:54,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:57,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:39:57,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:01,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:04,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:04,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:08,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:08,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1473, 'learning_rate': 5.94e-05, 'epoch': 0.91} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:11,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:11,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:15,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:18,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:18,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:21,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:21,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:25,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:28,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:28,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:32,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2295, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.91} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1966, 'learning_rate': 6.0599999999999996e-05, 'epoch': 0.92} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9692, 'learning_rate': 6.12e-05, 'epoch': 0.93} + 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8494, 'learning_rate': 6.18e-05, 'epoch': 0.94} + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6873, 'learning_rate': 6.239999999999999e-05, 'epoch': 0.95} + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.639, 'learning_rate': 6.299999999999999e-05, 'epoch': 0.96} +[WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:01,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:01,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:01,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:01,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5857, 'learning_rate': 6.359999999999999e-05, 'epoch': 0.97} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:09,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:09,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:13,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:13,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:13,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:19,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:19,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:43:23,715 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:43:23,715 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:43:23,715 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5445, 'learning_rate': 6.419999999999999e-05, 'epoch': 0.98} +[WARNING|modeling_utils.py:388] 2022-03-28 17:43:29,602 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:43:31,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:43:33,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:43:33,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:37,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:39,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:41,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:41,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▋ | 110/1110 [41:59<5:48:00, 20.88s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:43:43,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:44,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:43,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:48,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:43,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:49,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:43,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:51,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:43,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1110 [42:10<4:59:21, 17.98s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▊ | 111/1110 [42:10<4:59:21, 17.98s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:55,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:56,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:58,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:43:58,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:02,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:02,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:06,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:10,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:10,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:13,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:13,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.9572, 'learning_rate': 6.599999999999999e-05, 'epoch': 1.01} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5871, 'learning_rate': 6.659999999999999e-05, 'epoch': 1.02} + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4989, 'learning_rate': 6.72e-05, 'epoch': 1.03} + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████�� | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4903, 'learning_rate': 6.78e-05, 'epoch': 1.04} + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4851, 'learning_rate': 6.84e-05, 'epoch': 1.04} + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4768, 'learning_rate': 6.9e-05, 'epoch': 1.05} + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|██���█████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4474, 'learning_rate': 6.96e-05, 'epoch': 1.06} + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4565, 'learning_rate': 7.02e-05, 'epoch': 1.07} + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4612, 'learning_rate': 7.079999999999999e-05, 'epoch': 1.08} + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4651, 'learning_rate': 7.139999999999999e-05, 'epoch': 1.09} + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3432, 'learning_rate': 7.199999999999999e-05, 'epoch': 1.1} + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3871, 'learning_rate': 7.259999999999999e-05, 'epoch': 1.11} + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3813, 'learning_rate': 7.319999999999999e-05, 'epoch': 1.12} + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3116, 'learning_rate': 7.379999999999999e-05, 'epoch': 1.13} + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2469, 'learning_rate': 7.439999999999999e-05, 'epoch': 1.13} + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2718, 'learning_rate': 7.5e-05, 'epoch': 1.14} + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:51:07,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:51:07,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:51:07,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3074, 'learning_rate': 7.56e-05, 'epoch': 1.15} +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████ | 129/1110 [49:54<6:12:13, 22.77s/it]g-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:40,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:40,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:40,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:46,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:46,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:46,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:51:46,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:51:54,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:51:54,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:51:54,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3321, 'learning_rate': 7.68e-05, 'epoch': 1.17} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:01,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:01,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:52:08,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:52:10,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:52:10,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:52:10,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:52:16,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:52:16,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2847, 'learning_rate': 7.74e-05, 'epoch': 1.18} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:21,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:23,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:25,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:27,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:29,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:31,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:33,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▎ | 132/1110 [50:51<5:28:03, 20.13s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▎ | 132/1110 [50:51<5:28:03, 20.13s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:37,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:39,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:41,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:43,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:45,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:46,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▎ | 133/1110 [51:06<5:01:43, 18.53s/it] Setting `use_cache=False`...1] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▎ | 133/1110 [51:06<5:01:43, 18.53s/it] Setting `use_cache=False`...1] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:52,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:50,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:53,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:50,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:55,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:50,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:58,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:50,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:52:59,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:50,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 134/1110 [51:18<4:28:31, 16.51s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:53:02,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 134/1110 [51:18<4:28:31, 16.51s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:53:02,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:04,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:02,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:05,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:02,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:07,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:02,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 135/1110 [51:27<3:51:14, 14.23s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:02,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▍ | 135/1110 [51:27<3:51:14, 14.23s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:02,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:12,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:10,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:14,609 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:10,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:16,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:10,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 136/1110 [51:34<3:15:47, 12.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:10,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 136/1110 [51:34<3:15:47, 12.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:10,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 136/1110 [51:34<3:15:47, 12.06s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▌ | 136/1110 [51:34<3:15:47, 12.06s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:22,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:26,337 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:26,337 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:29,986 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:29,986 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:33,553 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:37,121 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:37,121 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:40,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:40,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:44,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:44,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 137/1110 [52:05<4:47:25, 17.72s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 137/1110 [52:05<4:47:25, 17.72s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 137/1110 [52:05<4:47:25, 17.72s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 12%|█████████▋ | 137/1110 [52:05<4:47:25, 17.72s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:53,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:56,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:53:56,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:00,276 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:03,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:03,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.6934, 'learning_rate': 8.16e-05, 'epoch': 1.24} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.5843, 'learning_rate': 8.22e-05, 'epoch': 1.25} +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.3033, 'learning_rate': 8.28e-05, 'epoch': 1.26} + 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9826, 'learning_rate': 8.34e-05, 'epoch': 1.27} +[WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7511, 'learning_rate': 8.4e-05, 'epoch': 1.28} + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5923, 'learning_rate': 8.459999999999998e-05, 'epoch': 1.29} + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4865, 'learning_rate': 8.519999999999998e-05, 'epoch': 1.3} + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4707, 'learning_rate': 8.579999999999998e-05, 'epoch': 1.3} + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4407, 'learning_rate': 8.639999999999999e-05, 'epoch': 1.31} + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4704, 'learning_rate': 8.699999999999999e-05, 'epoch': 1.32} +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3955, 'learning_rate': 8.759999999999999e-05, 'epoch': 1.33} +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4305, 'learning_rate': 8.819999999999999e-05, 'epoch': 1.34} +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4178, 'learning_rate': 8.879999999999999e-05, 'epoch': 1.35} + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3738, 'learning_rate': 8.939999999999999e-05, 'epoch': 1.36} + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.345, 'learning_rate': 8.999999999999999e-05, 'epoch': 1.37} + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:00:34,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:00:34,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3189, 'learning_rate': 9.059999999999999e-05, 'epoch': 1.38} +[WARNING|modeling_utils.py:388] 2022-03-28 18:00:38,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:00:38,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:00:38,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:00:38,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:00:46,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:00:46,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:00:51,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:00:51,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:00:51,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:00:51,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3548, 'learning_rate': 9.12e-05, 'epoch': 1.39} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:00:51,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:01:01,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:01:01,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:01:01,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:01:06,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:01:06,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:01:06,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:01:13,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:01:13,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:01:13,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3476, 'learning_rate': 9.18e-05, 'epoch': 1.39} +[WARNING|modeling_utils.py:388] 2022-03-28 18:01:19,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:01:19,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:01:23,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:01:23,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:01:27,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:01:27,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:01:27,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:01:27,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:01:35,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:01:35,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3373, 'learning_rate': 9.24e-05, 'epoch': 1.4} +[WARNING|modeling_utils.py:388] 2022-03-28 18:01:39,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:01:41,937 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:01:44,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:01:44,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:01:47,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:01:49,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:01:51,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▋ | 157/1110 [1:00:09<5:20:46, 20.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▋ | 157/1110 [1:00:09<5:20:46, 20.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:01:56,004 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:01:57,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:01:59,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:01,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:03,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:05,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 158/1110 [1:00:24<4:54:18, 18.55s/it] Setting `use_cache=False`...1] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▊ | 158/1110 [1:00:24<4:54:18, 18.55s/it] Setting `use_cache=False`...1] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:10,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:08,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:11,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:08,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:13,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:08,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:16,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:08,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:17,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:08,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 159/1110 [1:00:36<4:23:19, 16.61s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:02:20,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 14%|██████████▉ | 159/1110 [1:00:36<4:23:19, 16.61s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:02:20,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:21,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:20,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:24,333 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:20,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:26,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:20,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:28,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:20,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:30,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:29,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:30,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:29,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:32,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:29,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:34,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:29,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 161/1110 [1:00:53<3:13:02, 12.21s/it] Setting `use_cache=False`...1] 2022-03-28 18:02:29,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 161/1110 [1:00:53<3:13:02, 12.21s/it] Setting `use_cache=False`...1] 2022-03-28 18:02:29,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 161/1110 [1:00:53<3:13:02, 12.21s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 161/1110 [1:00:53<3:13:02, 12.21s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:41,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:41,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:45,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:48,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:48,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:52,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:52,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:56,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:56,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:02:59,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:03,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:03,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:03,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 162/1110 [1:01:24<4:42:40, 17.89s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████ | 162/1110 [1:01:24<4:42:40, 17.89s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:12,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:12,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:15,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:19,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:19,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:22,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:22,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:26,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.5254, 'learning_rate': 9.659999999999999e-05, 'epoch': 1.47} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.3926, 'learning_rate': 9.719999999999999e-05, 'epoch': 1.48} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1976, 'learning_rate': 9.779999999999999e-05, 'epoch': 1.48} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.84, 'learning_rate': 9.839999999999999e-05, 'epoch': 1.49} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6971, 'learning_rate': 9.9e-05, 'epoch': 1.5} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6292, 'learning_rate': 9.96e-05, 'epoch': 1.51} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5307, 'learning_rate': 0.0001002, 'epoch': 1.52} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4417, 'learning_rate': 0.0001008, 'epoch': 1.53} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.471, 'learning_rate': 0.0001014, 'epoch': 1.54} + 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▊ | 172/1110 [1:05:47<6:36:11, 25.34s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▊ | 172/1110 [1:05:47<6:36:11, 25.34s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4859, 'learning_rate': 0.000102, 'epoch': 1.55} + 15%|███████████▊ | 172/1110 [1:05:47<6:36:11, 25.34s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 15%|███████████▊ | 172/1110 [1:05:47<6:36:11, 25.34s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:07:39,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:07:39,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:07:39,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:07:39,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:07:39,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:07:39,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:07:39,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:07:39,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|███████████▊ | 173/1110 [1:06:11<6:30:50, 25.03s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 16%|███████████▊ | 173/1110 [1:06:11<6:30:50, 25.03s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4296, 'learning_rate': 0.0001026, 'epoch': 1.56} +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.415, 'learning_rate': 0.00010319999999999999, 'epoch': 1.57} +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.346, 'learning_rate': 0.00010379999999999999, 'epoch': 1.57} +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3523, 'learning_rate': 0.00010439999999999999, 'epoch': 1.58} +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2742, 'learning_rate': 0.00010499999999999999, 'epoch': 1.59} +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:09:48,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:09:48,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:09:52,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:09:52,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3161, 'learning_rate': 0.00010559999999999998, 'epoch': 1.6} +[WARNING|modeling_utils.py:388] 2022-03-28 18:09:52,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:09:52,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:09:52,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:09:52,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:04,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:04,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:04,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:10,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:10,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:10,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3355, 'learning_rate': 0.00010619999999999998, 'epoch': 1.61} +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:10,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:10,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:21,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:21,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:21,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:27,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:27,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:27,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:33,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:33,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2709, 'learning_rate': 0.00010679999999999998, 'epoch': 1.62} +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:37,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:37,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:10:41,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:10:41,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:45,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:45,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:45,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:51,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:53,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:10:53,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2814, 'learning_rate': 0.00010739999999999998, 'epoch': 1.63} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:10:57,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:10:59,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:01,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:03,501 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:05,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:07,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:09,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:09,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:11,211 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:13,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:14,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:16,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:19,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:21,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:23,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:23,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:27,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:29,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:31,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:33,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:35,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:35,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:38,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:40,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:42,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:44,567 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:44,567 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:46,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:48,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:50,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:50,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:50,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:53,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:53,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:57,557 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:11:57,557 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:01,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:04,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:04,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:08,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:08,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:11,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:11,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:15,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:18,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:18,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.436, 'learning_rate': 0.00011099999999999999, 'epoch': 1.68} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:24,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:24,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:28,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:28,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:31,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:35,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:35,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:38,540 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:38,540 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:41,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:45,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:45,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.7089, 'learning_rate': 0.00011159999999999999, 'epoch': 1.69} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.6178, 'learning_rate': 0.00011219999999999999, 'epoch': 1.7} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.3762, 'learning_rate': 0.00011279999999999999, 'epoch': 1.71} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9978, 'learning_rate': 0.00011339999999999999, 'epoch': 1.72} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7483, 'learning_rate': 0.00011399999999999999, 'epoch': 1.73} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6654, 'learning_rate': 0.0001146, 'epoch': 1.74} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.585, 'learning_rate': 0.0001152, 'epoch': 1.74} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4605, 'learning_rate': 0.0001158, 'epoch': 1.75} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4682, 'learning_rate': 0.0001164, 'epoch': 1.76} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4438, 'learning_rate': 0.000117, 'epoch': 1.77} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3513, 'learning_rate': 0.0001176, 'epoch': 1.78} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3653, 'learning_rate': 0.0001182, 'epoch': 1.79} + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2853, 'learning_rate': 0.0001188, 'epoch': 1.8} + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2774, 'learning_rate': 0.0001194, 'epoch': 1.81} + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3431, 'learning_rate': 0.00011999999999999999, 'epoch': 1.82} + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:19:06,831 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 203/1110 [1:17:25<5:55:16, 23.50s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 203/1110 [1:17:25<5:55:16, 23.50s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2969, 'learning_rate': 0.00012059999999999999, 'epoch': 1.83} + 18%|█████████████▉ | 203/1110 [1:17:25<5:55:16, 23.50s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 18%|█████████████▉ | 203/1110 [1:17:25<5:55:16, 23.50s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3317, 'learning_rate': 0.00012119999999999999, 'epoch': 1.83} +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:35,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:35,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:35,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:35,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:35,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:45,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:45,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:45,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:45,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2546, 'learning_rate': 0.00012179999999999999, 'epoch': 1.84} +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:45,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:55,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:55,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:55,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:19:55,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:20:03,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:20:03,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:20:03,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:20:10,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 206/1110 [1:18:28<5:29:26, 21.87s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 19%|██████████████ | 206/1110 [1:18:28<5:29:26, 21.87s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:14,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:16,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:18,590 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:18,590 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:20:22,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:20:24,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:20:26,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:20:28,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:20:28,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:20:28,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:32,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:34,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:36,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:37,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:39,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:41,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:43,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:43,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:46,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:48,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:49,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:52,718 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:54,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:56,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:56,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:20:58,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:00,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:02,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:04,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:04,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:07,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:08,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:10,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:12,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:12,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:14,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:14,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:18,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:18,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:21,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:21,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:25,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:28,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:28,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:32,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:32,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:35,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:39,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:39,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:39,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:39,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:44,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:44,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:48,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:51,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:51,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:54,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:54,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:21:58,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.3656, 'learning_rate': 0.0001266, 'epoch': 1.91} +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2901, 'learning_rate': 0.00012719999999999997, 'epoch': 1.92} +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9679, 'learning_rate': 0.0001278, 'epoch': 1.93} +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.74, 'learning_rate': 0.00012839999999999998, 'epoch': 1.94} +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6402, 'learning_rate': 0.000129, 'epoch': 1.95} +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:02,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:02,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:02,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:24:09,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:24:09,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:24:09,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:24:09,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5565, 'learning_rate': 0.00012959999999999998, 'epoch': 1.96} +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3116, 'learning_rate': 0.0001302, 'epoch': 1.97} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:24:41,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:24:41,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:24:41,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:24:41,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:49,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:49,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:49,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:55,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:55,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:24:55,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:25:00,323 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:25:00,323 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:04,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:06,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:08,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:10,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:12,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:13,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:13,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:15,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:17,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:20,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:22,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:23,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:23,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:26,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:28,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:28,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:31,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:34,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:34,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:38,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:38,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:41,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:41,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:45,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:49,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:49,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.599, 'learning_rate': 0.0001326, 'epoch': 2.01} +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4173, 'learning_rate': 0.00013319999999999999, 'epoch': 2.02} +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.423, 'learning_rate': 0.0001338, 'epoch': 2.03} +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3792, 'learning_rate': 0.0001344, 'epoch': 2.04} +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3279, 'learning_rate': 0.000135, 'epoch': 2.04} +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2576, 'learning_rate': 0.0001356, 'epoch': 2.05} +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1748, 'learning_rate': 0.0001362, 'epoch': 2.06} +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1601, 'learning_rate': 0.0001368, 'epoch': 2.07} +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1532, 'learning_rate': 0.0001374, 'epoch': 2.08} + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.08, 'learning_rate': 0.000138, 'epoch': 2.09} + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.089, 'learning_rate': 0.0001386, 'epoch': 2.1} + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0562, 'learning_rate': 0.0001392, 'epoch': 2.11} + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:06,442 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:06,442 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:06,442 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0354, 'learning_rate': 0.00013979999999999998, 'epoch': 2.12} + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9942, 'learning_rate': 0.0001404, 'epoch': 2.13} + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 21%|█████████████���██ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9382, 'learning_rate': 0.00014099999999999998, 'epoch': 2.13} +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:32:26,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:32:26,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:32:30,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:32:30,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:32:30,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:32:30,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:32:30,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:32:41,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:32:41,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:32:41,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:32:41,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:32:41,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8866, 'learning_rate': 0.0001422, 'epoch': 2.15} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:32:41,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:32:53,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:32:53,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:32:53,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:32:53,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:32:53,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:32:53,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:32:53,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▍ | 240/1110 [1:31:23<5:29:45, 22.74s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▍ | 240/1110 [1:31:23<5:29:45, 22.74s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8903, 'learning_rate': 0.00014279999999999997, 'epoch': 2.16} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:33:11,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:33:11,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:33:11,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:33:17,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:33:17,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:33:17,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:33:17,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:33:25,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:33:25,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:33:25,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.801, 'learning_rate': 0.0001434, 'epoch': 2.17} +[WARNING|modeling_utils.py:388] 2022-03-28 18:33:31,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:33:31,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:33:35,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:33:35,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:33:39,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:33:42,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:33:42,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:33:42,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 22%|████████████████▌ | 242/1110 [1:32:01<5:02:37, 20.92s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:33:48,301 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:33:50,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:33:52,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:33:54,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:33:56,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:33:56,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:33:56,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:02,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:02,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:04,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:06,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:08,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:09,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:11,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:14,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:16,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:16,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:18,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:19,794 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:22,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:24,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:26,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:28,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:28,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:30,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:33,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:35,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:37,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:37,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:39,305 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:41,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:42,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:44,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:44,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:46,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:46,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:50,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:50,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:54,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:57,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:34:57,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:01,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:01,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:04,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:04,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:08,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:11,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:11,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:11,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:15,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:15,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:19,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:19,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:22,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:26,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:26,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:29,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:29,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:32,949 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:36,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:36,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.738, 'learning_rate': 0.0001482, 'epoch': 2.24} +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.4078, 'learning_rate': 0.00014879999999999998, 'epoch': 2.25} +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.0056, 'learning_rate': 0.0001494, 'epoch': 2.26} +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8029, 'learning_rate': 0.00015, 'epoch': 2.27} +[WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6233, 'learning_rate': 0.00015059999999999997, 'epoch': 2.28} + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4243, 'learning_rate': 0.0001512, 'epoch': 2.29} + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2563, 'learning_rate': 0.00015179999999999998, 'epoch': 2.3} + 23%|█████���███████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2108, 'learning_rate': 0.0001524, 'epoch': 2.3} + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1417, 'learning_rate': 0.00015299999999999998, 'epoch': 2.31} + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2294, 'learning_rate': 0.0001536, 'epoch': 2.32} + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1063, 'learning_rate': 0.00015419999999999998, 'epoch': 2.33} + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0417, 'learning_rate': 0.0001548, 'epoch': 2.34} + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9855, 'learning_rate': 0.00015539999999999998, 'epoch': 2.35} + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 23%|████████████��████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9049, 'learning_rate': 0.000156, 'epoch': 2.36} + 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9212, 'learning_rate': 0.00015659999999999998, 'epoch': 2.37} + 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8797, 'learning_rate': 0.0001572, 'epoch': 2.38} + 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:42:11,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:42:11,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:42:11,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:42:11,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:42:20,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:42:20,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████▏ | 265/1110 [1:40:40<5:20:47, 22.78s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████▏ | 265/1110 [1:40:40<5:20:47, 22.78s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9043, 'learning_rate': 0.0001578, 'epoch': 2.39} + 24%|██████████████████▏ | 265/1110 [1:40:40<5:20:47, 22.78s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████▏ | 265/1110 [1:40:40<5:20:47, 22.78s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:42:32,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:42:32,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:42:36,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:42:36,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:42:36,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:42:42,622 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████▏ | 266/1110 [1:41:00<5:09:43, 22.02s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 24%|██████████████████▏ | 266/1110 [1:41:00<5:09:43, 22.02s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8784, 'learning_rate': 0.0001584, 'epoch': 2.39} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:42:48,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:42:48,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:42:52,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:42:52,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:42:57,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:42:59,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:42:59,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:43:03,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:43:03,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8376, 'learning_rate': 0.000159, 'epoch': 2.4} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:07,437 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:09,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:11,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:11,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:43:15,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:43:15,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:19,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:21,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:21,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:23,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:25,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:27,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:29,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:31,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:33,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:34,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:34,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:38,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:39,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:41,540 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:43,108 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:46,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:47,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:47,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:50,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:51,636 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:54,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:56,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:58,521 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:43:58,521 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:00,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:02,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:04,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:04,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:05,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:05,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:08,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:12,557 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:12,557 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:16,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:16,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:19,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:19,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:23,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:26,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:26,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:30,468 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:30,468 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:33,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:33,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.6041, 'learning_rate': 0.0001626, 'epoch': 2.46} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:37,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:41,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:41,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:44,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:44,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:48,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:51,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:51,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.8567, 'learning_rate': 0.0001632, 'epoch': 2.47} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.6862, 'learning_rate': 0.0001638, 'epoch': 2.48} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.4036, 'learning_rate': 0.0001644, 'epoch': 2.48} + 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9913, 'learning_rate': 0.000165, 'epoch': 2.49} + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7303, 'learning_rate': 0.0001656, 'epoch': 2.5} + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|���█████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5465, 'learning_rate': 0.0001662, 'epoch': 2.51} + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.33, 'learning_rate': 0.0001668, 'epoch': 2.52} + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2422, 'learning_rate': 0.0001674, 'epoch': 2.53} + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2522, 'learning_rate': 0.000168, 'epoch': 2.54} + 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1028, 'learning_rate': 0.00016919999999999997, 'epoch': 2.56} + Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0649, 'learning_rate': 0.00017039999999999997, 'epoch': 2.57} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0507, 'learning_rate': 0.00017099999999999998, 'epoch': 2.58} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.921, 'learning_rate': 0.00017159999999999997, 'epoch': 2.59} +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:18,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:18,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:18,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:18,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9083, 'learning_rate': 0.00017219999999999998, 'epoch': 2.6} +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:27,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:27,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:27,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:27,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:27,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:27,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:27,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:41,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:41,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:41,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8807, 'learning_rate': 0.00017279999999999997, 'epoch': 2.61} +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:41,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:41,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:51,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:51,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:51,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:51,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:51:51,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:01,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:01,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9007, 'learning_rate': 0.00017339999999999996, 'epoch': 2.62} +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:01,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:07,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:07,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:52:12,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:52:12,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:52:12,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:52:17,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:52:17,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:52:17,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:21,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:21,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:52:25,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:52:28,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:52:30,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:52:32,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:52:34,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 18:52:34,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:38,423 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:40,393 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:40,393 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:42,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:44,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:46,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:48,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:49,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:51,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:54,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:54,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:56,550 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:58,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:52:59,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:02,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:03,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:03,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:06,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:07,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:10,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:12,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:14,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:14,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:16,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:18,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:21,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:22,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:22,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4456, 'learning_rate': 0.00017699999999999997, 'epoch': 2.67} +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:26,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:26,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:30,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:30,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:33,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:37,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:37,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:40,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:40,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:44,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:44,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:47,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:51,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:51,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.4417, 'learning_rate': 0.00017759999999999998, 'epoch': 2.68} +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:55,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:55,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:53:58,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:01,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:01,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:05,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:05,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:08,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:08,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:12,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.4952, 'learning_rate': 0.00017819999999999997, 'epoch': 2.69} +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|███████��████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2771, 'learning_rate': 0.00017879999999999998, 'epoch': 2.7} + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9689, 'learning_rate': 0.00017939999999999997, 'epoch': 2.71} + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.709, 'learning_rate': 0.00017999999999999998, 'epoch': 2.72} + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3383, 'learning_rate': 0.00018119999999999999, 'epoch': 2.74} + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2212, 'learning_rate': 0.00018179999999999997, 'epoch': 2.74} + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2254, 'learning_rate': 0.0001824, 'epoch': 2.75} + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2006, 'learning_rate': 0.00018299999999999998, 'epoch': 2.76} + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0235, 'learning_rate': 0.00018419999999999998, 'epoch': 2.78} + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0431, 'learning_rate': 0.0001848, 'epoch': 2.79} + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████▎ | 311/1110 [1:57:46<5:26:21, 24.51s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 28%|█████████████████████▎ | 311/1110 [1:57:46<5:26:21, 24.51s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9653, 'learning_rate': 0.00018539999999999998, 'epoch': 2.8} +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9405, 'learning_rate': 0.000186, 'epoch': 2.81} +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8778, 'learning_rate': 0.00018659999999999998, 'epoch': 2.82} +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8905, 'learning_rate': 0.0001872, 'epoch': 2.83} +[WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:00:59,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:00:59,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8976, 'learning_rate': 0.00018779999999999998, 'epoch': 2.83} +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:03,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:03,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:03,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:03,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:03,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:03,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:01:15,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:01:15,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:20,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:20,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8547, 'learning_rate': 0.00018839999999999997, 'epoch': 2.84} +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:20,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:20,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:20,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:30,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:30,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:30,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:36,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:36,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:01:40,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:01:40,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8121, 'learning_rate': 0.00018899999999999999, 'epoch': 2.85} +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:44,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:46,431 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:48,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:48,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:01:52,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:01:52,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:01:52,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:58,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:01:58,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:00,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:02,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:04,770 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:06,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:08,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:10,244 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:11,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:13,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:13,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:15,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:18,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:20,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:21,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:24,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:25,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:25,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:28,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:30,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:33,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:35,141 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:35,141 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:37,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:39,078 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:40,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:42,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:42,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:44,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:44,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:48,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:48,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:51,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:51,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:55,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:58,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:02:58,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:02,509 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:02,509 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:05,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:09,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:09,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:09,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:12,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:12,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:16,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:19,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:19,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:23,255 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:23,255 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:26,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:30,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:30,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9258, 'learning_rate': 0.00019319999999999998, 'epoch': 2.91} +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|███████���██████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2863, 'learning_rate': 0.00019439999999999998, 'epoch': 2.93} + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1943, 'learning_rate': 0.000195, 'epoch': 2.94} + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2371, 'learning_rate': 0.00019559999999999998, 'epoch': 2.95} + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▌ | 329/1110 [2:04:01<5:09:36, 23.79s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▌ | 329/1110 [2:04:01<5:09:36, 23.79s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0108, 'learning_rate': 0.0001962, 'epoch': 2.96} +[WARNING|modeling_utils.py:388] 2022-03-28 19:05:49,204 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:05:49,204 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:05:53,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:05:53,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:05:57,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:05:57,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:05:57,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:05:57,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:05:57,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:05:57,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:05:57,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9665, 'learning_rate': 0.00019679999999999999, 'epoch': 2.97} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:06:11,538 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:06:11,538 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:17,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:17,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:17,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:23,590 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:25,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:25,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:25,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9371, 'learning_rate': 0.0001974, 'epoch': 2.98} +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:31,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:34,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:36,134 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:38,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:40,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:41,945 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:43,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:45,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:45,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:47,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:50,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:51,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:56,046 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:56,046 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:57,852 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:06:58,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:01,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:01,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:04,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:04,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:08,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:11,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:11,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:15,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:15,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:19,078 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.1462, 'learning_rate': 0.0001992, 'epoch': 3.01} +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3462, 'learning_rate': 0.0001998, 'epoch': 3.02} + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1581, 'learning_rate': 0.0002004, 'epoch': 3.03} + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1832, 'learning_rate': 0.000201, 'epoch': 3.04} + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1241, 'learning_rate': 0.0002016, 'epoch': 3.04} + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9371, 'learning_rate': 0.0002022, 'epoch': 3.05} + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8334, 'learning_rate': 0.0002028, 'epoch': 3.06} + 30%|███████████���██████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7693, 'learning_rate': 0.00020339999999999998, 'epoch': 3.07} + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██��███████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7465, 'learning_rate': 0.000204, 'epoch': 3.08} + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6217, 'learning_rate': 0.00020459999999999999, 'epoch': 3.09} + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6467, 'learning_rate': 0.0002052, 'epoch': 3.1} + 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5401, 'learning_rate': 0.0002058, 'epoch': 3.11} + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5786, 'learning_rate': 0.00020639999999999998, 'epoch': 3.12} + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 347/1110 [2:11:23<5:12:29, 24.57s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 347/1110 [2:11:23<5:12:29, 24.57s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4404, 'learning_rate': 0.00020699999999999996, 'epoch': 3.13} + 31%|███████████████████████▊ | 347/1110 [2:11:23<5:12:29, 24.57s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 347/1110 [2:11:23<5:12:29, 24.57s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 31%|███████████████████████▊ | 347/1110 [2:11:23<5:12:29, 24.57s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3864, 'learning_rate': 0.00020759999999999998, 'epoch': 3.13} +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2147, 'learning_rate': 0.00020819999999999996, 'epoch': 3.14} +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:14:18,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:14:18,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:14:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:14:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:14:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:14:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:14:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:14:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:14:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:14:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████ | 351/1110 [2:12:54<4:50:09, 22.94s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████ | 351/1110 [2:12:54<4:50:09, 22.94s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.34, 'learning_rate': 0.00020939999999999997, 'epoch': 3.16} + 32%|████████████████████████ | 351/1110 [2:12:54<4:50:09, 22.94s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████ | 351/1110 [2:12:54<4:50:09, 22.94s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████ | 351/1110 [2:12:54<4:50:09, 22.94s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████ | 351/1110 [2:12:54<4:50:09, 22.94s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████ | 351/1110 [2:12:54<4:50:09, 22.94s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████ | 351/1110 [2:12:54<4:50:09, 22.94s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:14:54,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:14:54,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████ | 352/1110 [2:13:15<4:40:43, 22.22s/it]g-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████ | 352/1110 [2:13:15<4:40:43, 22.22s/it]g-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2556, 'learning_rate': 0.00020999999999999998, 'epoch': 3.17} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:03,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:03,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:15:07,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:15:07,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:11,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:11,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:15:15,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:15:17,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:15:17,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2095, 'learning_rate': 0.00021059999999999997, 'epoch': 3.18} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:21,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:21,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:25,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:27,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:29,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:31,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:33,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▏ | 354/1110 [2:13:51<4:12:54, 20.07s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▏ | 354/1110 [2:13:51<4:12:54, 20.07s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:37,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:39,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:41,164 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:42,947 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:44,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:47,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:49,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▎ | 355/1110 [2:14:07<3:56:58, 18.83s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:15:51,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▎ | 355/1110 [2:14:07<3:56:58, 18.83s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:15:51,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:53,004 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:51,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:56,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:51,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:57,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:51,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:15:58,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:51,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:01,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:51,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:04,233 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:03,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:04,233 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:03,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:05,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:03,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:07,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:03,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:09,814 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:03,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▍ | 357/1110 [2:14:28<3:01:06, 14.43s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:16:12,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▍ | 357/1110 [2:14:28<3:01:06, 14.43s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:16:12,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:13,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:12,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:16,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:12,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:18,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:12,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:18,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:12,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▌ | 358/1110 [2:14:35<2:32:52, 12.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▌ | 358/1110 [2:14:35<2:32:52, 12.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:23,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:23,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:27,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:30,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:30,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:34,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:34,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:38,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:38,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:41,557 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:45,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:45,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:45,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▌ | 359/1110 [2:15:04<3:35:15, 17.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 32%|████████████████████████▌ | 359/1110 [2:15:04<3:35:15, 17.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:52,233 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:55,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:55,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:59,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:16:59,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:02,699 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:06,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:06,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.7524, 'learning_rate': 0.00021479999999999996, 'epoch': 3.24} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2796, 'learning_rate': 0.00021539999999999998, 'epoch': 3.25} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.8365, 'learning_rate': 0.00021599999999999996, 'epoch': 3.26} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4809, 'learning_rate': 0.00021659999999999998, 'epoch': 3.27} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1632, 'learning_rate': 0.00021719999999999997, 'epoch': 3.28} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9396, 'learning_rate': 0.00021839999999999997, 'epoch': 3.3} + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8539, 'learning_rate': 0.00021899999999999998, 'epoch': 3.3} + 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7321, 'learning_rate': 0.00021959999999999997, 'epoch': 3.31} + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|████████████���████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6681, 'learning_rate': 0.00022019999999999999, 'epoch': 3.32} + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6725, 'learning_rate': 0.00022079999999999997, 'epoch': 3.33} + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5996, 'learning_rate': 0.0002214, 'epoch': 3.34} + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|██████████████████���██████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.502, 'learning_rate': 0.00022199999999999998, 'epoch': 3.35} + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3827, 'learning_rate': 0.00022319999999999998, 'epoch': 3.37} + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3173, 'learning_rate': 0.0002238, 'epoch': 3.38} + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|████████��████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:23:50,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:23:50,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:23:50,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:23:56,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:23:56,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.315, 'learning_rate': 0.00022439999999999998, 'epoch': 3.39} +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:00,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:00,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:00,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:00,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:00,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:11,162 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:11,162 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:11,162 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:17,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:17,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3281, 'learning_rate': 0.000225, 'epoch': 3.39} +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:17,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:17,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:24:25,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:24:25,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:29,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:29,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:24:33,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:24:33,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:37,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:37,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.27, 'learning_rate': 0.00022559999999999998, 'epoch': 3.4} +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:41,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:43,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:45,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:47,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:49,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:51,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:51,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:53,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:56,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:57,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:24:59,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:25:01,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:25:03,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:25:03,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:08,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:10,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:10,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:11,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:13,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:15,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:18,180 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:19,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:22,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:22,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:23,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:26,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:28,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:30,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:30,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:31,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:34,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:36,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:38,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:38,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:38,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:42,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:42,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:45,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:45,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:49,602 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:53,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:53,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:56,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:25:56,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:00,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:00,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:03,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:03,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:03,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:07,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:10,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:10,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:14,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:14,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:17,983 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:21,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:21,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2546, 'learning_rate': 0.00022979999999999997, 'epoch': 3.47} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.9179, 'learning_rate': 0.0002304, 'epoch': 3.48} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6224, 'learning_rate': 0.00023099999999999998, 'epoch': 3.48} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.3587, 'learning_rate': 0.0002316, 'epoch': 3.49} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0705, 'learning_rate': 0.00023219999999999998, 'epoch': 3.5} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9197, 'learning_rate': 0.0002328, 'epoch': 3.51} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8546, 'learning_rate': 0.00023339999999999998, 'epoch': 3.52} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7795, 'learning_rate': 0.000234, 'epoch': 3.53} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7303, 'learning_rate': 0.00023459999999999998, 'epoch': 3.54} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7024, 'learning_rate': 0.0002352, 'epoch': 3.55} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5815, 'learning_rate': 0.00023579999999999999, 'epoch': 3.56} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5363, 'learning_rate': 0.0002364, 'epoch': 3.57} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.494, 'learning_rate': 0.000237, 'epoch': 3.57} +[WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3856, 'learning_rate': 0.0002376, 'epoch': 3.58} + 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:32:36,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:32:36,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:32:40,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:32:40,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:32:40,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:32:40,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:32:40,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:32:50,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:32:50,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:32:54,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:32:54,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:32:54,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:32:58,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:32:58,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:32:58,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:32:58,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:32:58,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:33:08,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:33:08,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:33:08,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:33:08,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:33:16,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:33:16,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.311, 'learning_rate': 0.0002394, 'epoch': 3.61} +[WARNING|modeling_utils.py:388] 2022-03-28 19:33:16,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:33:16,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:33:16,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:33:27,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:33:27,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:33:27,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:33:33,338 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:33:33,338 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▌ | 402/1110 [2:31:53<4:19:20, 21.98s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▌ | 402/1110 [2:31:53<4:19:20, 21.98s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:33:39,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:33:39,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:33:43,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:33:43,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:33:47,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:33:47,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:33:51,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:33:54,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▌ | 403/1110 [2:32:12<4:07:22, 20.99s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 36%|███████████████████████████▌ | 403/1110 [2:32:12<4:07:22, 20.99s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:33:57,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:33:57,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:01,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:03,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:05,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:07,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:09,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:09,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:11,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:13,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:15,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:17,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:19,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:21,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:22,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:26,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:26,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:27,704 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:29,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:32,493 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:33,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:35,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:38,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:38,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:39,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:41,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:44,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:46,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:48,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:48,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:50,528 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:52,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:54,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:54,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:55,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:58,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:34:58,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:02,485 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:02,485 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:06,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:06,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:09,714 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:13,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:13,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:16,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:16,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:20,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:20,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:23,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:23,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:27,438 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:27,438 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:30,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:30,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:34,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:34,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:37,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:41,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:41,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.6085, 'learning_rate': 0.0002448, 'epoch': 3.69} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.2909, 'learning_rate': 0.00024539999999999995, 'epoch': 3.7} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.7823, 'learning_rate': 0.00024599999999999996, 'epoch': 3.71} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.4985, 'learning_rate': 0.0002466, 'epoch': 3.72} + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.2019, 'learning_rate': 0.0002472, 'epoch': 3.73} + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.024, 'learning_rate': 0.00024779999999999995, 'epoch': 3.74} + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.8789, 'learning_rate': 0.00024839999999999997, 'epoch': 3.74} + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7719, 'learning_rate': 0.000249, 'epoch': 3.75} + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.773, 'learning_rate': 0.00024959999999999994, 'epoch': 3.76} + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5885, 'learning_rate': 0.00025019999999999996, 'epoch': 3.77} + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|██████���█████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5845, 'learning_rate': 0.00025079999999999997, 'epoch': 3.78} + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.523, 'learning_rate': 0.0002514, 'epoch': 3.79} + 38%|███████████████████████���████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4934, 'learning_rate': 0.00025199999999999995, 'epoch': 3.8} + 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4074, 'learning_rate': 0.00025259999999999996, 'epoch': 3.81} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3231, 'learning_rate': 0.0002532, 'epoch': 3.82} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:42:09,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:42:09,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3202, 'learning_rate': 0.0002538, 'epoch': 3.83} + 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████���███████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2468, 'learning_rate': 0.00025439999999999995, 'epoch': 3.83} + 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:42:42,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:42:42,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:42:42,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:42:42,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:42:50,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:42:50,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:42:50,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:42:56,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:42:56,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2139, 'learning_rate': 0.00025499999999999996, 'epoch': 3.84} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:01,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:01,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:43:05,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:43:05,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:09,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:09,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:43:13,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:43:13,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:43:13,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:17,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:19,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:19,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:43:23,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:43:23,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:27,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:29,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:31,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:31,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:33,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:35,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:37,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:38,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:40,632 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:42,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:45,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:47,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:47,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:49,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:50,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:52,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:55,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:56,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:59,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:43:59,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:00,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:03,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:05,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:07,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:10,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:10,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:11,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:13,613 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:15,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:15,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.1404, 'learning_rate': 0.0002586, 'epoch': 3.9} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:19,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:19,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:22,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:26,487 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:26,487 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:29,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:29,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:33,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:36,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:36,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:40,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:40,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:43,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:43,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.0964, 'learning_rate': 0.00025919999999999996, 'epoch': 3.91} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:47,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:47,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:50,967 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:54,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:54,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:44:57,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:01,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:01,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.1443, 'learning_rate': 0.00025979999999999997, 'epoch': 3.91} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.5983, 'learning_rate': 0.0002604, 'epoch': 3.92} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.1566, 'learning_rate': 0.000261, 'epoch': 3.93} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7846, 'learning_rate': 0.00026159999999999996, 'epoch': 3.94} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6671, 'learning_rate': 0.0002622, 'epoch': 3.95} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4806, 'learning_rate': 0.0002628, 'epoch': 3.96} + 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:47:38,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:47:38,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.5046, 'learning_rate': 0.00026339999999999995, 'epoch': 3.97} +[WARNING|modeling_utils.py:388] 2022-03-28 19:47:42,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:47:42,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:47:42,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:47:49,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:47:49,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:47:49,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:47:55,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:47:55,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:47:55,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:47:55,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3366, 'learning_rate': 0.00026399999999999997, 'epoch': 3.98} +[WARNING|modeling_bart.py:1051] 2022-03-28 19:48:03,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:48:03,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:48:03,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:09,478 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:11,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:13,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:15,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:17,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:17,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:18,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:20,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:23,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:24,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:27,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:27,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:29,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:31,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:31,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:34,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:34,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:38,229 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:38,229 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:41,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:45,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:45,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:48,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:48,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:52,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 5.0943, 'learning_rate': 0.00026579999999999996, 'epoch': 4.01} +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.9256, 'learning_rate': 0.00026639999999999997, 'epoch': 4.02} +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7867, 'learning_rate': 0.000267, 'epoch': 4.03} +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6839, 'learning_rate': 0.0002676, 'epoch': 4.04} +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.6042, 'learning_rate': 0.00026819999999999996, 'epoch': 4.04} +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3529, 'learning_rate': 0.0002688, 'epoch': 4.05} +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.3266, 'learning_rate': 0.0002694, 'epoch': 4.06} +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2218, 'learning_rate': 0.00027, 'epoch': 4.07} +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.1004, 'learning_rate': 0.00027059999999999996, 'epoch': 4.08} +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.0669, 'learning_rate': 0.0002712, 'epoch': 4.09} + 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.8998, 'learning_rate': 0.0002718, 'epoch': 4.1} + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.8966, 'learning_rate': 0.0002724, 'epoch': 4.11} + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.6145, 'learning_rate': 0.0002736, 'epoch': 4.13} + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.5475, 'learning_rate': 0.0002742, 'epoch': 4.13} + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.5106, 'learning_rate': 0.0002742, 'epoch': 4.14} + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:55:43,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:55:43,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.4872, 'learning_rate': 0.0002748, 'epoch': 4.15} +[WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▋ | 462/1110 [2:54:27<4:08:29, 23.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▋ | 462/1110 [2:54:27<4:08:29, 23.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:13,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:13,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:13,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:13,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:13,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:24,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:24,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:24,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:30,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:30,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2508, 'learning_rate': 0.000276, 'epoch': 4.17} +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:30,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:42,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:42,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:45,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:48,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:48,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:56:48,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:56:52,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:56:54,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:56:57,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:56:57,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:57:00,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:57:00,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:04,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▊ | 465/1110 [2:55:24<3:35:18, 20.03s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:57:08,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:10,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:08,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:12,373 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:08,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:14,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:08,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:16,052 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:08,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:17,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:08,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:19,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:08,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:19,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:08,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▉ | 466/1110 [2:55:39<3:17:56, 18.44s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:57:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:24,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:26,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:29,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:30,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:30,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:35,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:35,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|███████████████████████████████▉ | 467/1110 [2:55:52<3:01:49, 16.97s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:57:36,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:39,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:36,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:40,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:36,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:42,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:36,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:44,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:36,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:44,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:36,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:46,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:45,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:49,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:45,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:50,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:45,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:50,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:45,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|████████████████████████████████ | 469/1110 [2:56:08<2:11:32, 12.31s/it] Setting `use_cache=False`...1] 2022-03-28 19:57:45,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|████████████████████████████████ | 469/1110 [2:56:08<2:11:32, 12.31s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:57,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:57:57,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:00,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:00,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:04,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:04,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:07,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:11,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:11,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:15,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:15,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:18,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|████████████████████████████████▏ | 470/1110 [2:56:37<3:04:09, 17.26s/it] Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|████████████████████████████████▏ | 470/1110 [2:56:37<3:04:09, 17.26s/it] Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|████████████████████████████████▏ | 470/1110 [2:56:37<3:04:09, 17.26s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:25,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:25,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:29,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:29,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:32,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:36,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:36,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:39,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:42,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:42,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 19:58:42,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|████████████████████████████████▏ | 471/1110 [2:57:05<3:37:23, 20.41s/it] Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|████████████████████████████████▏ | 471/1110 [2:57:05<3:37:23, 20.41s/it] Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.6758, 'learning_rate': 0.0002808, 'epoch': 4.24} + 42%|████████████████████████████████▏ | 471/1110 [2:57:05<3:37:23, 20.41s/it] Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 42%|████████████████████████████████▏ | 471/1110 [2:57:05<3:37:23, 20.41s/it] Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.0751, 'learning_rate': 0.00028139999999999996, 'epoch': 4.25} +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.7796, 'learning_rate': 0.00028199999999999997, 'epoch': 4.26} +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.4917, 'learning_rate': 0.0002826, 'epoch': 4.27} +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2089, 'learning_rate': 0.00028319999999999994, 'epoch': 4.28} +[WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.0105, 'learning_rate': 0.00028379999999999996, 'epoch': 4.29} + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.9742, 'learning_rate': 0.0002844, 'epoch': 4.3} + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.8292, 'learning_rate': 0.000285, 'epoch': 4.3} + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.7116, 'learning_rate': 0.00028559999999999995, 'epoch': 4.31} + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.5631, 'learning_rate': 0.00028619999999999996, 'epoch': 4.32} + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.5057, 'learning_rate': 0.0002868, 'epoch': 4.33} + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.4719, 'learning_rate': 0.00028739999999999994, 'epoch': 4.34} + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2309, 'learning_rate': 0.00028799999999999995, 'epoch': 4.35} +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.2296, 'learning_rate': 0.00028859999999999997, 'epoch': 4.36} +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9529, 'learning_rate': 0.0002892, 'epoch': 4.37} +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:08,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:08,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:08,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:12,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:12,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:12,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:12,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:12,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:12,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:12,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:26,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:26,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:26,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:26,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 1.9794, 'learning_rate': 0.00029039999999999996, 'epoch': 4.39} +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:34,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:34,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:34,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:40,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:40,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:40,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:46,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:46,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 44%|█████████████████████████████████▍ | 488/1110 [3:04:07<3:48:20, 22.03s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 44%|█████████████████████████████████▍ | 488/1110 [3:04:07<3:48:20, 22.03s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:53,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:05:53,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:05:57,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:05:57,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:05:57,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:05:57,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:05,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:07,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:07,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:07,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:06:11,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:06:13,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:06:15,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:06:18,020 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_bart.py:1051] 2022-03-28 20:06:18,020 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:21,491 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:23,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:25,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:27,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:27,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:29,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:31,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:33,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:35,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:36,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:38,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:38,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:42,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:43,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:45,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:48,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:49,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:49,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:54,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:54,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:55,572 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:06:58,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:00,397 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:02,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:02,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:03,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:05,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:08,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:09,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:09,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:10,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:13,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:13,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:20,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:24,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:24,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:27,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:27,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:31,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:31,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:35,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:38,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:38,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 6.105, 'learning_rate': 0.00029519999999999997, 'epoch': 4.46} +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:42,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:42,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:45,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:45,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:49,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:52,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:52,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 4.909, 'learning_rate': 0.0002958, 'epoch': 4.47} +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.761, 'learning_rate': 0.0002964, 'epoch': 4.48} +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 3.2873, 'learning_rate': 0.00029699999999999996, 'epoch': 4.48} +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +{'loss': 2.961, 'learning_rate': 0.00029759999999999997, 'epoch': 4.49} + 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|██████████���███████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... + 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +03/28/2022 20:15:36 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow +{'eval_loss': 2.6299471855163574, 'eval_wer': 1.4451408171360571, 'eval_runtime': 336.2822, 'eval_samples_per_second': 7.856, 'eval_steps_per_second': 0.494, 'epoch': 4.5} +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +[INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... +03/28/2022 20:16:48 - WARNING - huggingface_hub.repository - Adding files tracked by Git LFS: ['wandb/run-20220328_170142-by95ehra/run-by95ehra.wandb']. This may take a bit of time if the files are large.