0%| | 0/1110 [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:01:45,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:01:47,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:01:47,859 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:01:49,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:01:49,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:01:51,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:01:51,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:01:53,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:01:53,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:01:54,995 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:01:55,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:01:56,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:01:57,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:01:58,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:01:59,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:00,713 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:01,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:02,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:03,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:04,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:05,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:06,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:07,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:09,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:09,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:11,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:11,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:12,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:13,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 10.1809, 'learning_rate': 0.0, 'epoch': 0.01} [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:14,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:15,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%| | 1/1110 [00:31<9:51:22, 32.00s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:02:16,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:17,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:18,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:19,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:20,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:21,160 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:22,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:22,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:24,242 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:24,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:26,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:26,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:27,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:28,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:29,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:30,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:31,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:32,323 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:33,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:34,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:35,322 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:35,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:37,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:37,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:38,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:39,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:40,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:41,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:42,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:43,241 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:44,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 10.3677, 'learning_rate': 6e-07, 'epoch': 0.02} [WARNING|modeling_utils.py:388] 2022-03-28 17:02:45,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▏ | 2/1110 [01:02<9:29:50, 30.86s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:02:46,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:47,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:48,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:49,279 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:50,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:51,064 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:52,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:52,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:54,030 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:54,655 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:55,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:56,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:57,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:02:58,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:02:59,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:00,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:01,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:01,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:03,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:03,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:04,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:05,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:06,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:07,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:08,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:09,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:10,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:10,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:12,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:12,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:13,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:14,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▏ | 3/1110 [01:30<9:12:44, 29.96s/it] 0%|▏ | 3/1110 [01:30<9:12:44, 29.96s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:03:15,671 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:16,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:17,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:18,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:19,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:19,825 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:20,985 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:21,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:22,770 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:23,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:24,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:25,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:26,276 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:26,860 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:28,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:28,685 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:29,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:30,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:31,670 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:32,268 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:33,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:34,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:35,206 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:35,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:36,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:37,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:38,729 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:39,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:40,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:41,105 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:42,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:42,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▎ | 4/1110 [01:59<9:01:46, 29.39s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:03:44,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 10.153, 'learning_rate': 1.2e-06, 'epoch': 0.04} [WARNING|modeling_utils.py:388] 2022-03-28 17:03:44,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:45,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:46,541 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:47,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:48,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:49,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:50,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:51,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:51,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:52,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:53,501 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:54,632 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:55,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:56,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:56,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:58,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:03:58,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:03:59,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:00,511 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:01,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:02,248 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:03,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:03,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:05,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:05,727 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:06,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:07,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:08,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:09,196 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:10,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 10.1314, 'learning_rate': 1.8e-06, 'epoch': 0.04} [WARNING|modeling_utils.py:388] 2022-03-28 17:04:10,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▎ | 5/1110 [02:27<8:52:18, 28.90s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:04:12,206 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:12,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:13,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:14,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:15,636 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:16,204 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:17,323 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:17,916 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:19,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:19,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:20,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:21,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:22,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:23,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:24,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:24,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:25,896 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:26,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:27,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:28,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:29,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:29,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:31,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:31,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:32,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:33,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:34,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:35,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:36,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:36,735 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:37,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:38,475 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▍ | 6/1110 [02:55<8:43:03, 28.43s/it] 1%|▍ | 6/1110 [02:55<8:43:03, 28.43s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:04:39,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:40,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:41,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:41,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:43,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:43,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:44,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:47,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:48,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:48,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:50,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:50,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:51,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:52,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:53,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:53,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:55,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:55,702 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:56,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:57,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:04:58,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:04:59,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:00,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:00,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:01,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:02,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:03,523 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:04,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:05,210 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:05,788 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:06,909 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:07,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▌ | 7/1110 [03:24<8:46:25, 28.64s/it] 1%|▌ | 7/1110 [03:24<8:46:25, 28.64s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:05:08,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:09,373 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:10,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:11,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:12,101 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:12,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:13,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:14,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:15,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:16,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:17,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:17,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:18,741 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:19,338 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:20,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:20,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:22,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:22,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:23,741 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:24,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:25,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:25,945 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:27,033 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:27,616 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:28,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:29,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:30,351 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:30,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:31,986 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:32,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed {'loss': 9.0158, 'learning_rate': 3.6e-06, 'epoch': 0.07} [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:33,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:34,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▌ | 8/1110 [03:50<8:34:52, 28.03s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:05:35,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:36,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:37,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:37,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:38,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:39,338 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:40,401 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:40,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:42,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:42,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:43,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:44,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:45,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:45,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:47,056 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:47,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:48,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:49,280 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:50,335 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:50,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:51,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:52,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:53,598 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:54,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:55,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:55,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:56,900 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:57,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:05:58,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:05:59,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:00,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:00,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▋ | 9/1110 [04:17<8:25:28, 27.55s/it] 1%|▋ | 9/1110 [04:17<8:25:28, 27.55s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:06:01,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:02,478 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:03,514 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:04,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:05,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:05,661 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:06,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:07,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:08,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:08,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:09,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:10,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:11,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:12,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:13,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:13,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:14,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:15,431 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:16,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:17,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:18,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:18,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:19,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:20,263 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:21,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:21,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:22,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:23,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:24,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:25,144 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:26,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 8.2283, 'learning_rate': 4.8e-06, 'epoch': 0.09} [WARNING|modeling_utils.py:388] 2022-03-28 17:06:26,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▋ | 10/1110 [04:43<8:16:25, 27.08s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:06:27,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:28,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:29,488 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:30,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:31,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:31,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:32,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:33,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:34,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:34,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:35,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:36,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:37,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:38,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:39,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:39,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:40,719 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:41,390 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:42,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:43,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:44,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:44,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:45,626 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:46,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:47,206 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:47,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:48,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:49,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:50,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:50,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:51,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:52,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 1%|▊ | 11/1110 [05:09<8:08:39, 26.68s/it] 1%|▊ | 11/1110 [05:09<8:08:39, 26.68s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:06:54,256 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:56,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:06:56,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:00,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:03,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:03,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:06,301 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:06,301 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:09,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:12,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:12,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:15,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:06:53,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 12/1110 [05:34<8:00:37, 26.26s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▊ | 12/1110 [05:34<8:00:37, 26.26s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.5517, 'learning_rate': 5.999999999999999e-06, 'epoch': 0.11} [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:22,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:25,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:25,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:28,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:33,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:33,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:36,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:39,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:39,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:42,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:18,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 13/1110 [06:01<8:03:48, 26.46s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 13/1110 [06:01<8:03:48, 26.46s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.3315, 'learning_rate': 6.599999999999999e-06, 'epoch': 0.12} [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:49,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:52,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:52,049 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:55,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:58,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:07:58,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:01,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:04,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:04,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:07,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:07,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:07:45,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 14/1110 [06:25<7:52:43, 25.88s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▉ | 14/1110 [06:25<7:52:43, 25.88s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:13,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:16,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:16,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:19,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:22,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:25,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:25,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:28,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:31,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 15/1110 [06:49<7:41:31, 25.29s/it] Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 15/1110 [06:49<7:41:31, 25.29s/it] Setting `use_cache=False`...1] 2022-03-28 17:08:10,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█ | 15/1110 [06:49<7:41:31, 25.29s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:37,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:37,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:40,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:43,178 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:46,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:46,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:49,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:49,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:51,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:08:54,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:34,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█▏ | 16/1110 [07:13<7:32:20, 24.81s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|█▏ | 16/1110 [07:13<7:32:20, 24.81s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.7689, 'learning_rate': 8.4e-06, 'epoch': 0.14} [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:00,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:03,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:03,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:06,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:09,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:12,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:12,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:15,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:18,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1110 [07:36<7:22:21, 24.28s/it] Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1110 [07:36<7:22:21, 24.28s/it] Setting `use_cache=False`...1] 2022-03-28 17:08:58,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1110 [07:36<7:22:21, 24.28s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:23,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:26,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:26,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:29,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:32,034 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.3901, 'learning_rate': 9.6e-06, 'epoch': 0.16} [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:09:34,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▎ | 19/1110 [08:21<7:05:40, 23.41s/it] Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▎ | 19/1110 [08:21<7:05:40, 23.41s/it] Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.3045, 'learning_rate': 1.02e-05, 'epoch': 0.17} 2%|█▎ | 19/1110 [08:21<7:05:40, 23.41s/it] Setting `use_cache=False`...1] 2022-03-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:10:11,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:10:11,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:10:15,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:10:15,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:10:19,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:10:19,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:10:24,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:10:24,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.1671, 'learning_rate': 1.0799999999999998e-05, 'epoch': 0.18} [WARNING|modeling_bart.py:1051] 2022-03-28 17:10:24,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:10:30,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:10:32,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:10:32,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:10:36,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:10:38,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:10:40,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:10:42,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:10:42,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.9445, 'learning_rate': 1.14e-05, 'epoch': 0.19} [WARNING|modeling_bart.py:1051] 2022-03-28 17:10:46,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:10:48,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:10:50,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:10:52,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:10:53,928 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:10:55,767 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:10:55,767 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:10:57,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:00,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:02,628 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:04,189 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:07,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:08,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:11,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:11,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:12,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:14,995 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:17,216 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:19,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:19,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:21,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:23,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:25,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:26,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:26,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.7463, 'learning_rate': 1.3799999999999998e-05, 'epoch': 0.22} [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:31,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:31,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:31,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:35,587 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:39,304 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:39,304 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:42,989 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:42,989 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:46,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:46,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:50,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:53,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:53,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:57,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:11:57,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.6185, 'learning_rate': 1.44e-05, 'epoch': 0.23} [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:01,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:01,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:04,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:04,890 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:08,465 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.5114, 'learning_rate': 1.4999999999999999e-05, 'epoch': 0.24} [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:12:12,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.3849, 'learning_rate': 1.5599999999999996e-05, 'epoch': 0.25} 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.3192, 'learning_rate': 1.6199999999999997e-05, 'epoch': 0.26} 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.2528, 'learning_rate': 1.68e-05, 'epoch': 0.27} 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1951, 'learning_rate': 1.74e-05, 'epoch': 0.28} 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1437, 'learning_rate': 1.7999999999999997e-05, 'epoch': 0.29} 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|█▉ | 28/1110 [11:11<7:02:11, 23.41s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1332, 'learning_rate': 1.8599999999999998e-05, 'epoch': 0.3} 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.082, 'learning_rate': 1.92e-05, 'epoch': 0.3} 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9941, 'learning_rate': 1.98e-05, 'epoch': 0.31} 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.977, 'learning_rate': 2.04e-05, 'epoch': 0.32} 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9351, 'learning_rate': 2.1e-05, 'epoch': 0.33} 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9614, 'learning_rate': 2.1599999999999996e-05, 'epoch': 0.34} 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8991, 'learning_rate': 2.2199999999999998e-05, 'epoch': 0.35} 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▎ | 33/1110 [13:30<8:00:54, 26.79s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|██▊ | 40/1110 [16:25<7:22:29, 24.81s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|██▊ | 40/1110 [16:25<7:22:29, 24.81s/it] Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9255, 'learning_rate': 2.28e-05, 'epoch': 0.36} [WARNING|modeling_utils.py:388] 2022-03-28 17:18:13,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:13,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:13,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:13,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:13,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:13,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:13,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8665, 'learning_rate': 2.34e-05, 'epoch': 0.37} [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8518, 'learning_rate': 2.3999999999999997e-05, 'epoch': 0.38} [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:18:28,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:11,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:11,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:11,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|███ | 43/1110 [17:32<6:49:55, 23.05s/it]g-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|███ | 43/1110 [17:32<6:49:55, 23.05s/it]g-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:19,064 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:19,064 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:22,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:22,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:22,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:29,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:29,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:29,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:29,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:29,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:29,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:29,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8202, 'learning_rate': 2.52e-05, 'epoch': 0.39} [WARNING|modeling_utils.py:388] 2022-03-28 17:19:43,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:43,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:43,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:49,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:19:49,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:19:53,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:19:55,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:09:20,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|███▏ | 45/1110 [18:14<6:24:29, 21.66s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 4%|███▏ | 45/1110 [18:14<6:24:29, 21.66s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9285, 'learning_rate': 2.5799999999999997e-05, 'epoch': 0.4} [WARNING|modeling_utils.py:388] 2022-03-28 17:20:02,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:04,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:06,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:08,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:10,422 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:12,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:14,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:14,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:16,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:18,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:19,964 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:21,755 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:23,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:25,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:28,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:28,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:30,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:31,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:34,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:36,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:38,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:38,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:40,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:42,723 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:44,997 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:46,068 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:48,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:48,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:50,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:51,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:54,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:55,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:55,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:57,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:20:57,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:01,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:01,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:05,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:08,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:08,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:12,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:12,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:16,352 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:16,352 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:19,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:19,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:23,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:27,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:27,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.2754, 'learning_rate': 2.94e-05, 'epoch': 0.46} [WARNING|modeling_utils.py:388] 2022-03-28 17:21:30,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:30,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:34,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:34,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:37,927 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:41,390 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:41,390 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:44,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:44,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:44,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:44,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:44,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:44,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:21:44,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.3344, 'learning_rate': 2.9999999999999997e-05, 'epoch': 0.47} [WARNING|modeling_utils.py:388] 2022-03-28 17:21:44,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.2162, 'learning_rate': 3.06e-05, 'epoch': 0.48} [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.0826, 'learning_rate': 3.119999999999999e-05, 'epoch': 0.48} [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8626, 'learning_rate': 3.1799999999999994e-05, 'epoch': 0.49} [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:22:02,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8319, 'learning_rate': 3.2399999999999995e-05, 'epoch': 0.5} 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.771, 'learning_rate': 3.2999999999999996e-05, 'epoch': 0.51} 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6895, 'learning_rate': 3.36e-05, 'epoch': 0.52} 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.693, 'learning_rate': 3.42e-05, 'epoch': 0.53} 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7241, 'learning_rate': 3.48e-05, 'epoch': 0.54} 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 56/1110 [22:01<7:30:08, 25.62s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7604, 'learning_rate': 3.539999999999999e-05, 'epoch': 0.55} 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▎ | 61/1110 [24:12<7:29:55, 25.73s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6755, 'learning_rate': 3.5999999999999994e-05, 'epoch': 0.56} 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6489, 'learning_rate': 3.6599999999999995e-05, 'epoch': 0.57} 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.658, 'learning_rate': 3.7199999999999996e-05, 'epoch': 0.57} 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6951, 'learning_rate': 3.78e-05, 'epoch': 0.58} 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▍ | 62/1110 [24:36<7:24:13, 25.43s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:27:50,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:27:50,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5903, 'learning_rate': 3.84e-05, 'epoch': 0.59} [WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:27:54,812 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:15,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:15,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:15,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:15,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6142, 'learning_rate': 3.9e-05, 'epoch': 0.6} [WARNING|modeling_utils.py:388] 2022-03-28 17:28:23,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:23,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:27,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:27,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:27,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:27,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:27,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:27,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1110 [26:56<6:35:13, 22.76s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1110 [26:56<6:35:13, 22.76s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6912, 'learning_rate': 3.96e-05, 'epoch': 0.61} 6%|████▊ | 68/1110 [26:56<6:35:13, 22.76s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:46,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:46,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:46,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:52,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:52,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:28:52,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:29:00,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:29:00,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:29:00,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6049, 'learning_rate': 4.02e-05, 'epoch': 0.62} [WARNING|modeling_utils.py:388] 2022-03-28 17:29:00,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:08,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:08,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:08,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:14,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:16,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:18,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:18,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▉ | 70/1110 [27:37<6:12:32, 21.49s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:29:22,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:29:25,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:29:25,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:29,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:31,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:33,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:35,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:37,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:39,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:41,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:43,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:45,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:46,870 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:48,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:50,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:50,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:53,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:55,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:29:57,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:00,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:01,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:04,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:04,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:05,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:07,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:09,401 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:11,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:13,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:13,684 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:15,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:17,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:19,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:20,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:20,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5642, 'learning_rate': 4.3799999999999994e-05, 'epoch': 0.67} [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:25,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:25,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:29,481 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:29,481 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:33,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:33,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:36,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:40,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:40,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:44,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:44,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:47,620 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:51,213 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:51,213 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:51,213 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:54,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:54,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:58,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:30:58,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:01,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:01,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:05,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1318, 'learning_rate': 4.4999999999999996e-05, 'epoch': 0.69} [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1167, 'learning_rate': 4.56e-05, 'epoch': 0.7} [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.0646, 'learning_rate': 4.62e-05, 'epoch': 0.71} [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8794, 'learning_rate': 4.68e-05, 'epoch': 0.72} [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6735, 'learning_rate': 4.7399999999999993e-05, 'epoch': 0.73} [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6389, 'learning_rate': 4.7999999999999994e-05, 'epoch': 0.74} [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6545, 'learning_rate': 4.8599999999999995e-05, 'epoch': 0.74} [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6443, 'learning_rate': 4.9199999999999997e-05, 'epoch': 0.75} [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6087, 'learning_rate': 4.98e-05, 'epoch': 0.76} [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5696, 'learning_rate': 5.04e-05, 'epoch': 0.77} [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5662, 'learning_rate': 5.1e-05, 'epoch': 0.78} [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6608, 'learning_rate': 5.1599999999999994e-05, 'epoch': 0.79} [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5441, 'learning_rate': 5.2199999999999995e-05, 'epoch': 0.8} [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6029, 'learning_rate': 5.279999999999999e-05, 'epoch': 0.81} [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:31:08,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4904, 'learning_rate': 5.339999999999999e-05, 'epoch': 0.82} [WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:05,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:31,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:31,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:31,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:31,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:31,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▌ | 92/1110 [35:56<6:34:13, 23.24s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▌ | 92/1110 [35:56<6:34:13, 23.24s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▌ | 92/1110 [35:56<6:34:13, 23.24s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▌ | 92/1110 [35:56<6:34:13, 23.24s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▌ | 92/1110 [35:56<6:34:13, 23.24s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▌ | 92/1110 [35:56<6:34:13, 23.24s/it]g-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:51,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:51,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:51,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:51,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:51,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:37:51,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5483, 'learning_rate': 5.459999999999999e-05, 'epoch': 0.83} [WARNING|modeling_bart.py:1051] 2022-03-28 17:38:04,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:38:04,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:38:04,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:38:04,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:38:12,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:38:12,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:38:12,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:38:12,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:38:12,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:38:22,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:38:22,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5813, 'learning_rate': 5.519999999999999e-05, 'epoch': 0.84} [WARNING|modeling_utils.py:388] 2022-03-28 17:38:22,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:38:28,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:38:28,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:38:33,094 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:38:35,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:38:35,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:38:39,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:38:41,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:38:41,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5633, 'learning_rate': 5.5799999999999994e-05, 'epoch': 0.85} [WARNING|modeling_bart.py:1051] 2022-03-28 17:38:45,355 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:38:47,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:38:49,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:38:51,561 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:38:53,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:38:55,415 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:38:57,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:38:57,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:38:59,322 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:01,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:02,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:04,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:08,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:09,670 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:11,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:11,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:12,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:14,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:17,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:20,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:21,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:21,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:23,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:25,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:27,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:29,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:31,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:31,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:33,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:36,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:37,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:38,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:38,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.492, 'learning_rate': 5.88e-05, 'epoch': 0.9} [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:43,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:43,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:46,944 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:50,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:50,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:54,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:54,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:57,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:39:57,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:01,191 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:04,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:04,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:08,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:08,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1473, 'learning_rate': 5.94e-05, 'epoch': 0.91} [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:11,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:11,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:15,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:18,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:18,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:21,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:21,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:25,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:28,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:28,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:32,107 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.2295, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.91} [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1966, 'learning_rate': 6.0599999999999996e-05, 'epoch': 0.92} [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:40:35,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9692, 'learning_rate': 6.12e-05, 'epoch': 0.93} 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1110 [39:44<6:28:58, 23.20s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8494, 'learning_rate': 6.18e-05, 'epoch': 0.94} 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6873, 'learning_rate': 6.239999999999999e-05, 'epoch': 0.95} 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▍ | 105/1110 [40:10<6:38:27, 23.79s/it] Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.639, 'learning_rate': 6.299999999999999e-05, 'epoch': 0.96} [WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:42:30,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:01,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:01,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:01,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:01,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5857, 'learning_rate': 6.359999999999999e-05, 'epoch': 0.97} [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:09,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:09,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:13,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:13,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:13,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:19,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:19,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:43:23,715 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:43:23,715 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:43:23,715 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5445, 'learning_rate': 6.419999999999999e-05, 'epoch': 0.98} [WARNING|modeling_utils.py:388] 2022-03-28 17:43:29,602 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:43:31,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:43:33,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:43:33,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:37,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:39,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:41,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:41,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:19:58,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▋ | 110/1110 [41:59<5:48:00, 20.88s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:43:43,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:44,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:43,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:48,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:43,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:49,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:43,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:51,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:43,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1110 [42:10<4:59:21, 17.98s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1110 [42:10<4:59:21, 17.98s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:55,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:56,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:58,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:43:58,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:02,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:02,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:06,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:10,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:10,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:13,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:13,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.9572, 'learning_rate': 6.599999999999999e-05, 'epoch': 1.01} [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:44:17,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5871, 'learning_rate': 6.659999999999999e-05, 'epoch': 1.02} 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4989, 'learning_rate': 6.72e-05, 'epoch': 1.03} 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4903, 'learning_rate': 6.78e-05, 'epoch': 1.04} 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1110 [43:13<6:47:53, 24.55s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4851, 'learning_rate': 6.84e-05, 'epoch': 1.04} 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4768, 'learning_rate': 6.9e-05, 'epoch': 1.05} 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4474, 'learning_rate': 6.96e-05, 'epoch': 1.06} 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4565, 'learning_rate': 7.02e-05, 'epoch': 1.07} 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4612, 'learning_rate': 7.079999999999999e-05, 'epoch': 1.08} 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4651, 'learning_rate': 7.139999999999999e-05, 'epoch': 1.09} 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3432, 'learning_rate': 7.199999999999999e-05, 'epoch': 1.1} 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3871, 'learning_rate': 7.259999999999999e-05, 'epoch': 1.11} 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3813, 'learning_rate': 7.319999999999999e-05, 'epoch': 1.12} 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████▏ | 116/1110 [44:34<7:14:56, 26.25s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3116, 'learning_rate': 7.379999999999999e-05, 'epoch': 1.13} 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2469, 'learning_rate': 7.439999999999999e-05, 'epoch': 1.13} 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2718, 'learning_rate': 7.5e-05, 'epoch': 1.14} 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 11%|████████▊ | 125/1110 [48:24<6:53:43, 25.20s/it] Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:51:07,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:51:07,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:51:07,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3074, 'learning_rate': 7.56e-05, 'epoch': 1.15} [WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:13,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████ | 129/1110 [49:54<6:12:13, 22.77s/it]g-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:40,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:40,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:40,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:46,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:46,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:46,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:51:46,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:51:54,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:51:54,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:51:54,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3321, 'learning_rate': 7.68e-05, 'epoch': 1.17} [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:01,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:01,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:04,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:52:08,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:52:10,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:52:10,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:52:10,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:52:16,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:52:16,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2847, 'learning_rate': 7.74e-05, 'epoch': 1.18} [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:21,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:23,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:25,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:27,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:29,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:31,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:33,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:43:54,175 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▎ | 132/1110 [50:51<5:28:03, 20.13s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▎ | 132/1110 [50:51<5:28:03, 20.13s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:37,858 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:39,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:41,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:43,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:45,227 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:46,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▎ | 133/1110 [51:06<5:01:43, 18.53s/it] Setting `use_cache=False`...1] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▎ | 133/1110 [51:06<5:01:43, 18.53s/it] Setting `use_cache=False`...1] 2022-03-28 17:52:35,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:52,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:50,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:53,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:50,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:55,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:50,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:58,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:50,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:52:59,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:52:50,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 134/1110 [51:18<4:28:31, 16.51s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:53:02,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 134/1110 [51:18<4:28:31, 16.51s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:53:02,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:04,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:02,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:05,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:02,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:07,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:02,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 135/1110 [51:27<3:51:14, 14.23s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:02,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▍ | 135/1110 [51:27<3:51:14, 14.23s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:02,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:12,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:10,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:14,609 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:10,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:16,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:10,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▌ | 136/1110 [51:34<3:15:47, 12.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:10,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▌ | 136/1110 [51:34<3:15:47, 12.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:10,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▌ | 136/1110 [51:34<3:15:47, 12.06s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▌ | 136/1110 [51:34<3:15:47, 12.06s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:22,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:26,337 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:26,337 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:29,986 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:29,986 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:33,553 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:37,121 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:37,121 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:40,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:40,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:44,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:44,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 137/1110 [52:05<4:47:25, 17.72s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 137/1110 [52:05<4:47:25, 17.72s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:18,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 137/1110 [52:05<4:47:25, 17.72s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 137/1110 [52:05<4:47:25, 17.72s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:53,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:56,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:53:56,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:00,276 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:03,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:03,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.6934, 'learning_rate': 8.16e-05, 'epoch': 1.24} [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.5843, 'learning_rate': 8.22e-05, 'epoch': 1.25} [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 17:54:07,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.3033, 'learning_rate': 8.28e-05, 'epoch': 1.26} 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 140/1110 [53:27<6:29:00, 24.06s/it] Setting `use_cache=False`...1] 2022-03-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9826, 'learning_rate': 8.34e-05, 'epoch': 1.27} [WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:55:36,020 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7511, 'learning_rate': 8.4e-05, 'epoch': 1.28} 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5923, 'learning_rate': 8.459999999999998e-05, 'epoch': 1.29} 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4865, 'learning_rate': 8.519999999999998e-05, 'epoch': 1.3} 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4707, 'learning_rate': 8.579999999999998e-05, 'epoch': 1.3} 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4407, 'learning_rate': 8.639999999999999e-05, 'epoch': 1.31} 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▉ | 142/1110 [54:20<6:47:52, 25.28s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4704, 'learning_rate': 8.699999999999999e-05, 'epoch': 1.32} [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3955, 'learning_rate': 8.759999999999999e-05, 'epoch': 1.33} [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4305, 'learning_rate': 8.819999999999999e-05, 'epoch': 1.34} [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:57:58,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4178, 'learning_rate': 8.879999999999999e-05, 'epoch': 1.35} 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3738, 'learning_rate': 8.939999999999999e-05, 'epoch': 1.36} 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.345, 'learning_rate': 8.999999999999999e-05, 'epoch': 1.37} 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▌ | 150/1110 [57:43<6:38:46, 24.92s/it]g-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:00:34,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:00:34,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3189, 'learning_rate': 9.059999999999999e-05, 'epoch': 1.38} [WARNING|modeling_utils.py:388] 2022-03-28 18:00:38,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:00:38,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:00:38,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:00:38,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:00:46,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:00:46,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:00:51,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:00:51,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:00:51,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:00:51,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3548, 'learning_rate': 9.12e-05, 'epoch': 1.39} [WARNING|modeling_bart.py:1051] 2022-03-28 18:00:51,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:01:01,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:01:01,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:01:01,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:01:06,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:01:06,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:01:06,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:01:13,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:01:13,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:01:13,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3476, 'learning_rate': 9.18e-05, 'epoch': 1.39} [WARNING|modeling_utils.py:388] 2022-03-28 18:01:19,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:01:19,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:01:23,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:01:23,683 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:01:27,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:01:27,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:01:27,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:01:27,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:01:35,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:01:35,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3373, 'learning_rate': 9.24e-05, 'epoch': 1.4} [WARNING|modeling_utils.py:388] 2022-03-28 18:01:39,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:01:41,937 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:01:44,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:01:44,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:01:47,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:01:49,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:01:51,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 17:53:49,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▋ | 157/1110 [1:00:09<5:20:46, 20.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▋ | 157/1110 [1:00:09<5:20:46, 20.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:01:56,004 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:01:57,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:01:59,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:01,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:03,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:05,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▊ | 158/1110 [1:00:24<4:54:18, 18.55s/it] Setting `use_cache=False`...1] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▊ | 158/1110 [1:00:24<4:54:18, 18.55s/it] Setting `use_cache=False`...1] 2022-03-28 18:01:54,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:10,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:08,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:11,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:08,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:13,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:08,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:16,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:08,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:17,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:08,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 159/1110 [1:00:36<4:23:19, 16.61s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:02:20,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 159/1110 [1:00:36<4:23:19, 16.61s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:02:20,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:21,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:20,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:24,333 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:20,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:26,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:20,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:28,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:20,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:30,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:29,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:30,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:29,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:32,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:29,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:34,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:29,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 161/1110 [1:00:53<3:13:02, 12.21s/it] Setting `use_cache=False`...1] 2022-03-28 18:02:29,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 161/1110 [1:00:53<3:13:02, 12.21s/it] Setting `use_cache=False`...1] 2022-03-28 18:02:29,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 161/1110 [1:00:53<3:13:02, 12.21s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 161/1110 [1:00:53<3:13:02, 12.21s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:41,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:41,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:45,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:48,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:48,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:52,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:52,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:56,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:56,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:02:59,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:03,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:03,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:03,254 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:02:37,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 162/1110 [1:01:24<4:42:40, 17.89s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████ | 162/1110 [1:01:24<4:42:40, 17.89s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:12,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:12,498 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:15,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:19,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:19,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:22,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:22,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:26,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.5254, 'learning_rate': 9.659999999999999e-05, 'epoch': 1.47} [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.3926, 'learning_rate': 9.719999999999999e-05, 'epoch': 1.48} [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1976, 'learning_rate': 9.779999999999999e-05, 'epoch': 1.48} [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.84, 'learning_rate': 9.839999999999999e-05, 'epoch': 1.49} [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6971, 'learning_rate': 9.9e-05, 'epoch': 1.5} [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6292, 'learning_rate': 9.96e-05, 'epoch': 1.51} [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5307, 'learning_rate': 0.0001002, 'epoch': 1.52} [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4417, 'learning_rate': 0.0001008, 'epoch': 1.53} [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:03:29,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.471, 'learning_rate': 0.0001014, 'epoch': 1.54} 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▋ | 171/1110 [1:05:22<6:40:32, 25.59s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▊ | 172/1110 [1:05:47<6:36:11, 25.34s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▊ | 172/1110 [1:05:47<6:36:11, 25.34s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4859, 'learning_rate': 0.000102, 'epoch': 1.55} 15%|███████████▊ | 172/1110 [1:05:47<6:36:11, 25.34s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▊ | 172/1110 [1:05:47<6:36:11, 25.34s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:07:39,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:07:39,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:07:39,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:07:39,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:07:39,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:07:39,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:07:39,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:07:39,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|███████████▊ | 173/1110 [1:06:11<6:30:50, 25.03s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|███████████▊ | 173/1110 [1:06:11<6:30:50, 25.03s/it] Setting `use_cache=False`...1] 2022-03-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4296, 'learning_rate': 0.0001026, 'epoch': 1.56} [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.415, 'learning_rate': 0.00010319999999999999, 'epoch': 1.57} [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.346, 'learning_rate': 0.00010379999999999999, 'epoch': 1.57} [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3523, 'learning_rate': 0.00010439999999999999, 'epoch': 1.58} [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2742, 'learning_rate': 0.00010499999999999999, 'epoch': 1.59} [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:07:59,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:09:48,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:09:48,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:09:52,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:09:52,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3161, 'learning_rate': 0.00010559999999999998, 'epoch': 1.6} [WARNING|modeling_utils.py:388] 2022-03-28 18:09:52,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:09:52,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:09:52,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:09:52,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:04,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:04,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:04,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:10,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:10,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:10,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3355, 'learning_rate': 0.00010619999999999998, 'epoch': 1.61} [WARNING|modeling_utils.py:388] 2022-03-28 18:10:10,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:10,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:21,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:21,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:21,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:27,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:27,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:27,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:33,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:33,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2709, 'learning_rate': 0.00010679999999999998, 'epoch': 1.62} [WARNING|modeling_utils.py:388] 2022-03-28 18:10:37,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:37,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:10:41,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:10:41,195 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:45,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:45,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:45,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:51,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:53,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:10:53,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2814, 'learning_rate': 0.00010739999999999998, 'epoch': 1.63} [WARNING|modeling_bart.py:1051] 2022-03-28 18:10:57,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:10:59,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:01,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:03,501 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:05,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:07,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:09,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:09,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:11,211 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:13,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:14,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:16,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:19,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:21,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:23,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:23,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:27,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:29,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:31,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:33,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:35,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:35,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:38,249 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:40,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:42,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:44,567 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:44,567 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:46,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:48,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:50,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:50,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:50,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:53,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:53,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:57,557 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:11:57,557 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:01,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:04,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:04,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:08,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:08,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:11,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:11,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:15,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:18,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:18,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.436, 'learning_rate': 0.00011099999999999999, 'epoch': 1.68} [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:24,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:24,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:28,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:28,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:31,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:35,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:35,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:38,540 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:38,540 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:41,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:45,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:45,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.7089, 'learning_rate': 0.00011159999999999999, 'epoch': 1.69} [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.6178, 'learning_rate': 0.00011219999999999999, 'epoch': 1.7} [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.3762, 'learning_rate': 0.00011279999999999999, 'epoch': 1.71} [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9978, 'learning_rate': 0.00011339999999999999, 'epoch': 1.72} [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7483, 'learning_rate': 0.00011399999999999999, 'epoch': 1.73} [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6654, 'learning_rate': 0.0001146, 'epoch': 1.74} [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.585, 'learning_rate': 0.0001152, 'epoch': 1.74} [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4605, 'learning_rate': 0.0001158, 'epoch': 1.75} [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4682, 'learning_rate': 0.0001164, 'epoch': 1.76} [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4438, 'learning_rate': 0.000117, 'epoch': 1.77} [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3513, 'learning_rate': 0.0001176, 'epoch': 1.78} [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:12:48,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3653, 'learning_rate': 0.0001182, 'epoch': 1.79} 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2853, 'learning_rate': 0.0001188, 'epoch': 1.8} 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 199/1110 [1:15:51<6:16:20, 24.79s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2774, 'learning_rate': 0.0001194, 'epoch': 1.81} 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3431, 'learning_rate': 0.00011999999999999999, 'epoch': 1.82} 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 201/1110 [1:16:39<6:11:18, 24.51s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:19:06,831 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▉ | 203/1110 [1:17:25<5:55:16, 23.50s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▉ | 203/1110 [1:17:25<5:55:16, 23.50s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2969, 'learning_rate': 0.00012059999999999999, 'epoch': 1.83} 18%|█████████████▉ | 203/1110 [1:17:25<5:55:16, 23.50s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▉ | 203/1110 [1:17:25<5:55:16, 23.50s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3317, 'learning_rate': 0.00012119999999999999, 'epoch': 1.83} [WARNING|modeling_utils.py:388] 2022-03-28 18:19:16,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:35,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:35,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:35,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:35,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:35,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:45,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:45,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:45,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:45,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2546, 'learning_rate': 0.00012179999999999999, 'epoch': 1.84} [WARNING|modeling_utils.py:388] 2022-03-28 18:19:45,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:55,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:55,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:55,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:19:55,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:20:03,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:20:03,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:20:03,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:20:10,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 206/1110 [1:18:28<5:29:26, 21.87s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 206/1110 [1:18:28<5:29:26, 21.87s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:14,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:16,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:18,590 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:18,590 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:20:22,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:20:24,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:20:26,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:20:28,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:20:28,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:20:28,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:32,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:34,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:36,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:37,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:39,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:41,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:43,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:43,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:46,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:48,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:49,720 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:52,718 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:54,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:56,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:56,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:20:58,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:00,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:02,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:04,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:04,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:07,070 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:08,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:10,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:12,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:12,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:14,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:14,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:18,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:18,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:21,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:21,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:25,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:28,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:28,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:32,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:32,257 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:35,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:39,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:39,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:39,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:39,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:44,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:44,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:48,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:51,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:51,419 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:54,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:54,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:21:58,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.3656, 'learning_rate': 0.0001266, 'epoch': 1.91} [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.2901, 'learning_rate': 0.00012719999999999997, 'epoch': 1.92} [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9679, 'learning_rate': 0.0001278, 'epoch': 1.93} [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.74, 'learning_rate': 0.00012839999999999998, 'epoch': 1.94} [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6402, 'learning_rate': 0.000129, 'epoch': 1.95} [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:22:01,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:02,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:02,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:02,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:24:09,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:24:09,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:24:09,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:24:09,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5565, 'learning_rate': 0.00012959999999999998, 'epoch': 1.96} [WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:17,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3116, 'learning_rate': 0.0001302, 'epoch': 1.97} [WARNING|modeling_bart.py:1051] 2022-03-28 18:24:41,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:24:41,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:24:41,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:24:41,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:49,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:49,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:49,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:55,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:55,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:24:55,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:25:00,323 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:25:00,323 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:04,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:06,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:08,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:10,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:12,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:13,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:13,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:15,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:17,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:20,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:22,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:23,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:23,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:26,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:28,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:28,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:31,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:34,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:34,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:38,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:38,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:41,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:41,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:45,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:49,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:49,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.599, 'learning_rate': 0.0001326, 'epoch': 2.01} [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4173, 'learning_rate': 0.00013319999999999999, 'epoch': 2.02} [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.423, 'learning_rate': 0.0001338, 'epoch': 2.03} [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3792, 'learning_rate': 0.0001344, 'epoch': 2.04} [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3279, 'learning_rate': 0.000135, 'epoch': 2.04} [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2576, 'learning_rate': 0.0001356, 'epoch': 2.05} [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1748, 'learning_rate': 0.0001362, 'epoch': 2.06} [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1601, 'learning_rate': 0.0001368, 'epoch': 2.07} [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:25:52,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1532, 'learning_rate': 0.0001374, 'epoch': 2.08} 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.08, 'learning_rate': 0.000138, 'epoch': 2.09} 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.089, 'learning_rate': 0.0001386, 'epoch': 2.1} 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0562, 'learning_rate': 0.0001392, 'epoch': 2.11} 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|███████████████▊ | 231/1110 [1:27:50<6:26:18, 26.37s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:06,442 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:06,442 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:06,442 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0354, 'learning_rate': 0.00013979999999999998, 'epoch': 2.12} 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9942, 'learning_rate': 0.0001404, 'epoch': 2.13} 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 235/1110 [1:29:28<6:02:57, 24.89s/it]g-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9382, 'learning_rate': 0.00014099999999999998, 'epoch': 2.13} [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:31:53,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:32:26,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:32:26,535 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:32:30,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:32:30,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:32:30,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:32:30,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:32:30,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:32:41,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:32:41,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:32:41,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:32:41,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:32:41,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8866, 'learning_rate': 0.0001422, 'epoch': 2.15} [WARNING|modeling_bart.py:1051] 2022-03-28 18:32:41,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:32:53,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:32:53,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:32:53,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:32:53,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:32:53,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:32:53,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:32:53,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 240/1110 [1:31:23<5:29:45, 22.74s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▍ | 240/1110 [1:31:23<5:29:45, 22.74s/it] Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8903, 'learning_rate': 0.00014279999999999997, 'epoch': 2.16} [WARNING|modeling_bart.py:1051] 2022-03-28 18:33:11,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:33:11,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:33:11,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:33:17,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:33:17,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:33:17,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:33:17,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:33:25,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:33:25,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:33:25,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.801, 'learning_rate': 0.0001434, 'epoch': 2.17} [WARNING|modeling_utils.py:388] 2022-03-28 18:33:31,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:33:31,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:33:35,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:33:35,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:33:39,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:33:42,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:33:42,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:33:42,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:03:08,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 242/1110 [1:32:01<5:02:37, 20.92s/it][WARNING|modeling_bart.py:1051] 2022-03-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:33:48,301 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:33:50,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:33:52,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:33:54,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:33:56,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:33:56,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:33:56,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:02,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:02,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:04,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:06,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:08,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:09,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:11,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:14,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:16,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:16,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:18,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:19,794 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:22,740 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:24,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:26,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:28,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:28,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:30,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:33,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:35,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:37,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:37,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:39,305 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:41,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:42,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:44,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:44,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:46,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:46,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:50,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:50,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:54,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:57,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:34:57,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:01,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:01,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:04,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:04,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:08,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:11,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:11,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:11,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:15,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:15,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:19,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:19,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:22,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:26,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:26,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:29,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:29,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:32,949 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:36,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:36,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.738, 'learning_rate': 0.0001482, 'epoch': 2.24} [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.4078, 'learning_rate': 0.00014879999999999998, 'epoch': 2.25} [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.0056, 'learning_rate': 0.0001494, 'epoch': 2.26} [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:35:39,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8029, 'learning_rate': 0.00015, 'epoch': 2.27} [WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:37:04,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6233, 'learning_rate': 0.00015059999999999997, 'epoch': 2.28} 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4243, 'learning_rate': 0.0001512, 'epoch': 2.29} 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2563, 'learning_rate': 0.00015179999999999998, 'epoch': 2.3} 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▎ | 253/1110 [1:35:47<6:01:50, 25.33s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2108, 'learning_rate': 0.0001524, 'epoch': 2.3} 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1417, 'learning_rate': 0.00015299999999999998, 'epoch': 2.31} 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2294, 'learning_rate': 0.0001536, 'epoch': 2.32} 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1063, 'learning_rate': 0.00015419999999999998, 'epoch': 2.33} 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 256/1110 [1:37:07<6:13:12, 26.22s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0417, 'learning_rate': 0.0001548, 'epoch': 2.34} 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9855, 'learning_rate': 0.00015539999999999998, 'epoch': 2.35} 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▊ | 260/1110 [1:38:45<5:53:06, 24.92s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9049, 'learning_rate': 0.000156, 'epoch': 2.36} 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▉ | 262/1110 [1:39:34<5:49:19, 24.72s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9212, 'learning_rate': 0.00015659999999999998, 'epoch': 2.37} 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8797, 'learning_rate': 0.0001572, 'epoch': 2.38} 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████ | 263/1110 [1:39:57<5:41:01, 24.16s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:42:11,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:42:11,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:42:11,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:42:11,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:42:20,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:42:20,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████▏ | 265/1110 [1:40:40<5:20:47, 22.78s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████▏ | 265/1110 [1:40:40<5:20:47, 22.78s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9043, 'learning_rate': 0.0001578, 'epoch': 2.39} 24%|██████████████████▏ | 265/1110 [1:40:40<5:20:47, 22.78s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████▏ | 265/1110 [1:40:40<5:20:47, 22.78s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:42:32,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:42:32,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:42:36,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:42:36,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:42:36,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:42:42,622 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████▏ | 266/1110 [1:41:00<5:09:43, 22.02s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|██████████████████▏ | 266/1110 [1:41:00<5:09:43, 22.02s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8784, 'learning_rate': 0.0001584, 'epoch': 2.39} [WARNING|modeling_bart.py:1051] 2022-03-28 18:42:48,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:42:48,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:42:52,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:42:52,809 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:42:57,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:42:59,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:42:59,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:43:03,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:43:03,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8376, 'learning_rate': 0.000159, 'epoch': 2.4} [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:07,437 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:09,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:11,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:11,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:43:15,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:43:15,308 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:19,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:21,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:21,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:23,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:25,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:27,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:29,440 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:31,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:33,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:34,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:34,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:38,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:39,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:41,540 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:43,108 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:46,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:47,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:47,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:50,344 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:51,636 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:54,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:56,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:58,521 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:43:58,521 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:00,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:02,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:04,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:04,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:05,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:05,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:08,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:12,557 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:12,557 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:16,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:16,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:19,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:19,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:23,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:26,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:26,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:30,468 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:30,468 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:33,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:33,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.6041, 'learning_rate': 0.0001626, 'epoch': 2.46} [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:37,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:41,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:41,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:44,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:44,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:48,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:51,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:51,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.8567, 'learning_rate': 0.0001632, 'epoch': 2.47} [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.6862, 'learning_rate': 0.0001638, 'epoch': 2.48} [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:44:55,113 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.4036, 'learning_rate': 0.0001644, 'epoch': 2.48} 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 276/1110 [1:44:15<5:36:11, 24.19s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9913, 'learning_rate': 0.000165, 'epoch': 2.49} 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7303, 'learning_rate': 0.0001656, 'epoch': 2.5} 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5465, 'learning_rate': 0.0001662, 'epoch': 2.51} 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.33, 'learning_rate': 0.0001668, 'epoch': 2.52} 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2422, 'learning_rate': 0.0001674, 'epoch': 2.53} 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|██████████████████▉ | 277/1110 [1:44:42<5:45:37, 24.89s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2522, 'learning_rate': 0.000168, 'epoch': 2.54} 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▎ | 282/1110 [1:46:51<5:54:39, 25.70s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 25%|███████████████████▍ | 283/1110 [1:47:16<5:50:20, 25.42s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1028, 'learning_rate': 0.00016919999999999997, 'epoch': 2.56} Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▌ | 285/1110 [1:48:04<5:40:50, 24.79s/it] Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0649, 'learning_rate': 0.00017039999999999997, 'epoch': 2.57} [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0507, 'learning_rate': 0.00017099999999999998, 'epoch': 2.58} [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.921, 'learning_rate': 0.00017159999999999997, 'epoch': 2.59} [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:50:09,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:18,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:18,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:18,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:18,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9083, 'learning_rate': 0.00017219999999999998, 'epoch': 2.6} [WARNING|modeling_utils.py:388] 2022-03-28 18:51:27,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:27,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:27,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:27,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:27,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:27,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:27,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:41,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:41,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:41,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8807, 'learning_rate': 0.00017279999999999997, 'epoch': 2.61} [WARNING|modeling_utils.py:388] 2022-03-28 18:51:41,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:41,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:51,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:51,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:51,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:51,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:51:51,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:01,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:01,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9007, 'learning_rate': 0.00017339999999999996, 'epoch': 2.62} [WARNING|modeling_utils.py:388] 2022-03-28 18:52:01,678 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:07,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:07,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:52:12,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:52:12,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:52:12,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:52:17,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:52:17,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:52:17,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:21,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:21,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:52:25,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:52:28,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:52:30,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:52:32,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:52:34,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 18:52:34,378 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:38,423 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:40,393 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:40,393 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:42,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:44,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:46,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:48,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:49,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:51,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:54,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:54,850 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:56,550 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:58,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:52:59,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:02,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:03,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:03,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:06,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:07,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:10,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:12,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:14,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:14,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:16,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:18,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:21,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:22,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:22,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4456, 'learning_rate': 0.00017699999999999997, 'epoch': 2.67} [WARNING|modeling_utils.py:388] 2022-03-28 18:53:26,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:26,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:30,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:30,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:33,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:37,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:37,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:40,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:40,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:44,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:44,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:47,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:51,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:51,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.4417, 'learning_rate': 0.00017759999999999998, 'epoch': 2.68} [WARNING|modeling_utils.py:388] 2022-03-28 18:53:55,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:55,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:53:58,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:01,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:01,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:05,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:05,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:08,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:08,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:12,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.4952, 'learning_rate': 0.00017819999999999997, 'epoch': 2.69} [WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:54:15,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.2771, 'learning_rate': 0.00017879999999999998, 'epoch': 2.7} 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9689, 'learning_rate': 0.00017939999999999997, 'epoch': 2.71} 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.709, 'learning_rate': 0.00017999999999999998, 'epoch': 2.72} 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 300/1110 [1:53:05<5:08:13, 22.83s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3383, 'learning_rate': 0.00018119999999999999, 'epoch': 2.74} 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2212, 'learning_rate': 0.00018179999999999997, 'epoch': 2.74} 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2254, 'learning_rate': 0.0001824, 'epoch': 2.75} 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2006, 'learning_rate': 0.00018299999999999998, 'epoch': 2.76} 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▋ | 303/1110 [1:54:24<5:39:36, 25.25s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0235, 'learning_rate': 0.00018419999999999998, 'epoch': 2.78} 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0431, 'learning_rate': 0.0001848, 'epoch': 2.79} 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████ | 308/1110 [1:56:34<5:41:12, 25.53s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▎ | 311/1110 [1:57:46<5:26:21, 24.51s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▎ | 311/1110 [1:57:46<5:26:21, 24.51s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9653, 'learning_rate': 0.00018539999999999998, 'epoch': 2.8} [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9405, 'learning_rate': 0.000186, 'epoch': 2.81} [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8778, 'learning_rate': 0.00018659999999999998, 'epoch': 2.82} [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 18:59:34,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8905, 'learning_rate': 0.0001872, 'epoch': 2.83} [WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:00:37,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:00:59,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:00:59,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8976, 'learning_rate': 0.00018779999999999998, 'epoch': 2.83} [WARNING|modeling_utils.py:388] 2022-03-28 19:01:03,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:03,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:03,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:03,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:03,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:03,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:01:15,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:01:15,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:20,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:20,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8547, 'learning_rate': 0.00018839999999999997, 'epoch': 2.84} [WARNING|modeling_utils.py:388] 2022-03-28 19:01:20,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:20,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:20,167 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:30,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:30,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:30,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:36,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:36,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:01:40,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:01:40,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8121, 'learning_rate': 0.00018899999999999999, 'epoch': 2.85} [WARNING|modeling_utils.py:388] 2022-03-28 19:01:44,217 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:46,431 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:48,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:48,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:01:52,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:01:52,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:01:52,535 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:58,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:01:58,739 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:00,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:02,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:04,770 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:06,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:08,461 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:10,244 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:11,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:13,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:13,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:15,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:18,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:20,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:21,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:24,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:25,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:25,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:28,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:30,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:33,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:35,141 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:35,141 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:37,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:39,078 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:40,795 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:42,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:42,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:44,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:44,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:48,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:48,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:51,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:51,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:55,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:58,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:02:58,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:02,509 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:02,509 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:05,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:09,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:09,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:09,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:12,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:12,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:16,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:19,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:19,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:23,255 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:23,255 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:26,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:30,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:30,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9258, 'learning_rate': 0.00019319999999999998, 'epoch': 2.91} [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:03:33,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2863, 'learning_rate': 0.00019439999999999998, 'epoch': 2.93} 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1943, 'learning_rate': 0.000195, 'epoch': 2.94} 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2371, 'learning_rate': 0.00019559999999999998, 'epoch': 2.95} 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 325/1110 [2:02:23<4:53:41, 22.45s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▌ | 329/1110 [2:04:01<5:09:36, 23.79s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▌ | 329/1110 [2:04:01<5:09:36, 23.79s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0108, 'learning_rate': 0.0001962, 'epoch': 2.96} [WARNING|modeling_utils.py:388] 2022-03-28 19:05:49,204 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:05:49,204 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:05:53,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:05:53,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:05:57,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:05:57,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:05:57,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:05:57,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:05:57,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:05:57,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:05:57,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9665, 'learning_rate': 0.00019679999999999999, 'epoch': 2.97} [WARNING|modeling_bart.py:1051] 2022-03-28 19:06:11,538 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:06:11,538 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:17,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:17,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:17,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:23,590 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:25,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:25,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:25,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9371, 'learning_rate': 0.0001974, 'epoch': 2.98} [WARNING|modeling_utils.py:388] 2022-03-28 19:06:31,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:34,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:36,134 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:38,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:40,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:41,945 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:43,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:45,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:45,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:47,239 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:50,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:51,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:53,865 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:56,046 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:56,046 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:57,852 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:06:58,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:01,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:01,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:04,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:04,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:08,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:11,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:11,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:15,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:15,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:19,078 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.1462, 'learning_rate': 0.0001992, 'epoch': 3.01} [WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:07:22,575 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3462, 'learning_rate': 0.0001998, 'epoch': 3.02} 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1581, 'learning_rate': 0.0002004, 'epoch': 3.03} 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1832, 'learning_rate': 0.000201, 'epoch': 3.04} 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1241, 'learning_rate': 0.0002016, 'epoch': 3.04} 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9371, 'learning_rate': 0.0002022, 'epoch': 3.05} 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8334, 'learning_rate': 0.0002028, 'epoch': 3.06} 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7693, 'learning_rate': 0.00020339999999999998, 'epoch': 3.07} 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7465, 'learning_rate': 0.000204, 'epoch': 3.08} 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6217, 'learning_rate': 0.00020459999999999999, 'epoch': 3.09} 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▉ | 335/1110 [2:06:12<5:07:38, 23.82s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6467, 'learning_rate': 0.0002052, 'epoch': 3.1} 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 344/1110 [2:10:11<5:30:09, 25.86s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5401, 'learning_rate': 0.0002058, 'epoch': 3.11} 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5786, 'learning_rate': 0.00020639999999999998, 'epoch': 3.12} 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 345/1110 [2:10:35<5:24:09, 25.42s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 347/1110 [2:11:23<5:12:29, 24.57s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 347/1110 [2:11:23<5:12:29, 24.57s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4404, 'learning_rate': 0.00020699999999999996, 'epoch': 3.13} 31%|███████████████████████▊ | 347/1110 [2:11:23<5:12:29, 24.57s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 347/1110 [2:11:23<5:12:29, 24.57s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 347/1110 [2:11:23<5:12:29, 24.57s/it]g-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3864, 'learning_rate': 0.00020759999999999998, 'epoch': 3.13} [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2147, 'learning_rate': 0.00020819999999999996, 'epoch': 3.14} [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:13:16,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:14:18,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:14:18,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:14:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:14:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:14:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:14:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:14:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:14:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:14:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:14:22,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 18:33:46,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████ | 351/1110 [2:12:54<4:50:09, 22.94s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████ | 351/1110 [2:12:54<4:50:09, 22.94s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.34, 'learning_rate': 0.00020939999999999997, 'epoch': 3.16} 32%|████████████████████████ | 351/1110 [2:12:54<4:50:09, 22.94s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████ | 351/1110 [2:12:54<4:50:09, 22.94s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████ | 351/1110 [2:12:54<4:50:09, 22.94s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████ | 351/1110 [2:12:54<4:50:09, 22.94s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████ | 351/1110 [2:12:54<4:50:09, 22.94s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████ | 351/1110 [2:12:54<4:50:09, 22.94s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:14:54,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:14:54,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████ | 352/1110 [2:13:15<4:40:43, 22.22s/it]g-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████ | 352/1110 [2:13:15<4:40:43, 22.22s/it]g-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2556, 'learning_rate': 0.00020999999999999998, 'epoch': 3.17} [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:03,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:03,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:15:07,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:15:07,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:11,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:11,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:15:15,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:15:17,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:15:17,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2095, 'learning_rate': 0.00021059999999999997, 'epoch': 3.18} [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:21,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:21,826 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:25,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:27,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:29,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:31,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:33,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:14:38,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▏ | 354/1110 [2:13:51<4:12:54, 20.07s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▏ | 354/1110 [2:13:51<4:12:54, 20.07s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:37,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:39,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:41,164 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:42,947 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:44,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:47,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:49,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:35,537 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▎ | 355/1110 [2:14:07<3:56:58, 18.83s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:15:51,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▎ | 355/1110 [2:14:07<3:56:58, 18.83s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:15:51,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:53,004 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:51,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:56,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:51,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:57,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:51,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:15:58,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:51,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:01,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:15:51,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:04,233 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:03,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:04,233 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:03,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:05,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:03,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:07,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:03,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:09,814 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:03,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▍ | 357/1110 [2:14:28<3:01:06, 14.43s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:16:12,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▍ | 357/1110 [2:14:28<3:01:06, 14.43s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:16:12,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:13,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:12,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:16,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:12,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:18,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:12,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:18,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:12,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▌ | 358/1110 [2:14:35<2:32:52, 12.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▌ | 358/1110 [2:14:35<2:32:52, 12.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:23,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:23,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:27,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:30,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:30,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:34,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:34,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:38,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:38,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:41,557 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:45,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:45,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:45,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:19,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▌ | 359/1110 [2:15:04<3:35:15, 17.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▌ | 359/1110 [2:15:04<3:35:15, 17.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:52,233 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:55,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:55,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:59,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:16:59,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:02,699 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:06,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:06,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.7524, 'learning_rate': 0.00021479999999999996, 'epoch': 3.24} [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.2796, 'learning_rate': 0.00021539999999999998, 'epoch': 3.25} [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8365, 'learning_rate': 0.00021599999999999996, 'epoch': 3.26} [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4809, 'learning_rate': 0.00021659999999999998, 'epoch': 3.27} [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1632, 'learning_rate': 0.00021719999999999997, 'epoch': 3.28} [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:17:09,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9396, 'learning_rate': 0.00021839999999999997, 'epoch': 3.3} 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1110 [2:17:47<5:17:21, 25.56s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8539, 'learning_rate': 0.00021899999999999998, 'epoch': 3.3} 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 367/1110 [2:18:38<5:15:50, 25.51s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7321, 'learning_rate': 0.00021959999999999997, 'epoch': 3.31} 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6681, 'learning_rate': 0.00022019999999999999, 'epoch': 3.32} 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6725, 'learning_rate': 0.00022079999999999997, 'epoch': 3.33} 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 368/1110 [2:19:05<5:20:50, 25.94s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5996, 'learning_rate': 0.0002214, 'epoch': 3.34} 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.502, 'learning_rate': 0.00022199999999999998, 'epoch': 3.35} 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 371/1110 [2:20:18<5:07:02, 24.93s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 373/1110 [2:21:05<4:56:06, 24.11s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3827, 'learning_rate': 0.00022319999999999998, 'epoch': 3.37} 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3173, 'learning_rate': 0.0002238, 'epoch': 3.38} 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 374/1110 [2:21:30<4:57:46, 24.27s/it] Setting `use_cache=False`...1] 2022-03-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:23:50,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:23:50,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:23:50,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:23:56,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:23:56,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.315, 'learning_rate': 0.00022439999999999998, 'epoch': 3.39} [WARNING|modeling_utils.py:388] 2022-03-28 19:24:00,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:00,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:00,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:00,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:00,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:11,162 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:11,162 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:11,162 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:17,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:17,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3281, 'learning_rate': 0.000225, 'epoch': 3.39} [WARNING|modeling_utils.py:388] 2022-03-28 19:24:17,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:17,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:24:25,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:24:25,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:29,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:29,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:24:33,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:24:33,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:37,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:37,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.27, 'learning_rate': 0.00022559999999999998, 'epoch': 3.4} [WARNING|modeling_utils.py:388] 2022-03-28 19:24:41,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:43,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:45,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:47,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:49,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:51,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:51,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:53,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:56,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:57,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:24:59,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:25:01,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:25:03,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:25:03,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:08,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:10,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:10,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:11,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:13,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:15,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:18,180 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:19,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:22,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:22,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:23,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:26,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:28,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:30,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:30,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:31,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:34,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:36,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:38,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:38,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:38,884 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:42,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:42,376 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:45,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:45,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:49,602 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:53,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:53,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:56,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:25:56,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:00,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:00,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:03,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:03,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:03,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:07,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:10,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:10,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:14,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:14,495 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:17,983 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:21,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:21,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.2546, 'learning_rate': 0.00022979999999999997, 'epoch': 3.47} [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9179, 'learning_rate': 0.0002304, 'epoch': 3.48} [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6224, 'learning_rate': 0.00023099999999999998, 'epoch': 3.48} [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3587, 'learning_rate': 0.0002316, 'epoch': 3.49} [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0705, 'learning_rate': 0.00023219999999999998, 'epoch': 3.5} [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9197, 'learning_rate': 0.0002328, 'epoch': 3.51} [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8546, 'learning_rate': 0.00023339999999999998, 'epoch': 3.52} [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7795, 'learning_rate': 0.000234, 'epoch': 3.53} [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7303, 'learning_rate': 0.00023459999999999998, 'epoch': 3.54} [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:26:24,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7024, 'learning_rate': 0.0002352, 'epoch': 3.55} [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5815, 'learning_rate': 0.00023579999999999999, 'epoch': 3.56} [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5363, 'learning_rate': 0.0002364, 'epoch': 3.57} [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:30:21,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.494, 'learning_rate': 0.000237, 'epoch': 3.57} [WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:31:44,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3856, 'learning_rate': 0.0002376, 'epoch': 3.58} 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▎ | 398/1110 [2:30:26<4:46:21, 24.13s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:32:36,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:32:36,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:32:40,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:32:40,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:32:40,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:32:40,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:32:40,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:32:50,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:32:50,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:32:54,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:32:54,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:32:54,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:32:58,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:32:58,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:32:58,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:32:58,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:32:58,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:33:08,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:33:08,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:33:08,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:33:08,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:33:16,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:33:16,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.311, 'learning_rate': 0.0002394, 'epoch': 3.61} [WARNING|modeling_utils.py:388] 2022-03-28 19:33:16,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:33:16,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:33:16,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:33:27,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:33:27,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:33:27,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:33:33,338 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:33:33,338 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▌ | 402/1110 [2:31:53<4:19:20, 21.98s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▌ | 402/1110 [2:31:53<4:19:20, 21.98s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:33:39,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:33:39,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:33:43,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:33:43,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:33:47,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:33:47,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:33:51,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:33:54,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▌ | 403/1110 [2:32:12<4:07:22, 20.99s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▌ | 403/1110 [2:32:12<4:07:22, 20.99s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:33:57,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:33:57,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:01,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:03,885 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:05,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:07,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:09,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:09,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:11,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:13,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:15,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:17,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:19,305 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:21,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:22,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:26,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:26,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:27,704 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:29,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:32,493 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:33,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:35,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:38,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:38,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:39,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:41,996 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:44,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:46,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:48,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:48,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:50,528 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:52,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:54,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:54,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:55,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:58,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:34:58,836 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:02,485 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:02,485 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:06,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:06,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:09,714 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:13,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:13,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:16,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:16,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:20,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:20,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:23,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:23,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:27,438 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:27,438 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:30,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:30,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:34,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:34,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:37,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:41,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:41,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.6085, 'learning_rate': 0.0002448, 'epoch': 3.69} [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.2909, 'learning_rate': 0.00024539999999999995, 'epoch': 3.7} [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7823, 'learning_rate': 0.00024599999999999996, 'epoch': 3.71} [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:35:44,796 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4985, 'learning_rate': 0.0002466, 'epoch': 3.72} 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2019, 'learning_rate': 0.0002472, 'epoch': 3.73} 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.024, 'learning_rate': 0.00024779999999999995, 'epoch': 3.74} 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8789, 'learning_rate': 0.00024839999999999997, 'epoch': 3.74} 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7719, 'learning_rate': 0.000249, 'epoch': 3.75} 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.773, 'learning_rate': 0.00024959999999999994, 'epoch': 3.76} 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5885, 'learning_rate': 0.00025019999999999996, 'epoch': 3.77} 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5845, 'learning_rate': 0.00025079999999999997, 'epoch': 3.78} 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|████████████████████████████▎ | 413/1110 [2:35:32<4:50:28, 25.00s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.523, 'learning_rate': 0.0002514, 'epoch': 3.79} 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4934, 'learning_rate': 0.00025199999999999995, 'epoch': 3.8} 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|████████████████████████████▊ | 421/1110 [2:38:56<4:47:35, 25.04s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4074, 'learning_rate': 0.00025259999999999996, 'epoch': 3.81} [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3231, 'learning_rate': 0.0002532, 'epoch': 3.82} [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:41:15,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:42:09,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:42:09,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3202, 'learning_rate': 0.0002538, 'epoch': 3.83} 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2468, 'learning_rate': 0.00025439999999999995, 'epoch': 3.83} 38%|█████████████████████████████ | 425/1110 [2:40:30<4:30:21, 23.68s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:42:42,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:42:42,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:42:42,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:42:42,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:42:50,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:42:50,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:42:50,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:42:56,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:42:56,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2139, 'learning_rate': 0.00025499999999999996, 'epoch': 3.84} [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:01,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:01,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:43:05,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:43:05,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:09,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:09,554 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:43:13,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:43:13,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:43:13,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:17,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:19,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:19,757 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:43:23,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:43:23,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:27,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:29,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:31,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:31,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:33,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:35,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:37,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:38,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:40,632 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:42,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:45,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:47,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:47,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:49,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:50,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:52,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:55,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:56,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:59,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:43:59,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:00,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:03,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:05,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:07,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:10,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:10,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:11,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:13,613 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:15,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:15,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.1404, 'learning_rate': 0.0002586, 'epoch': 3.9} [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:19,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:19,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:22,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:26,487 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:26,487 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:29,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:29,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:33,482 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:36,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:36,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:40,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:40,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:43,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:43,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.0964, 'learning_rate': 0.00025919999999999996, 'epoch': 3.91} [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:47,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:47,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:50,967 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:54,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:54,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:44:57,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:01,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:01,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1443, 'learning_rate': 0.00025979999999999997, 'epoch': 3.91} [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5983, 'learning_rate': 0.0002604, 'epoch': 3.92} [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1566, 'learning_rate': 0.000261, 'epoch': 3.93} [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7846, 'learning_rate': 0.00026159999999999996, 'epoch': 3.94} [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6671, 'learning_rate': 0.0002622, 'epoch': 3.95} [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:45:04,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4806, 'learning_rate': 0.0002628, 'epoch': 3.96} 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▏ | 440/1110 [2:45:34<4:25:55, 23.81s/it] Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:47:38,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:47:38,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5046, 'learning_rate': 0.00026339999999999995, 'epoch': 3.97} [WARNING|modeling_utils.py:388] 2022-03-28 19:47:42,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:47:42,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:47:42,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:47:49,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:47:49,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:47:49,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:47:55,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:47:55,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:47:55,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:47:55,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3366, 'learning_rate': 0.00026399999999999997, 'epoch': 3.98} [WARNING|modeling_bart.py:1051] 2022-03-28 19:48:03,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:48:03,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:48:03,148 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:09,478 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:11,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:13,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:15,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:17,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:17,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:18,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:20,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:23,453 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:24,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:27,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:27,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:29,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:31,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:31,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:34,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:34,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:38,229 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:38,229 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:41,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:45,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:45,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:48,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:48,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:52,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.0943, 'learning_rate': 0.00026579999999999996, 'epoch': 4.01} [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9256, 'learning_rate': 0.00026639999999999997, 'epoch': 4.02} [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7867, 'learning_rate': 0.000267, 'epoch': 4.03} [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6839, 'learning_rate': 0.0002676, 'epoch': 4.04} [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6042, 'learning_rate': 0.00026819999999999996, 'epoch': 4.04} [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3529, 'learning_rate': 0.0002688, 'epoch': 4.05} [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3266, 'learning_rate': 0.0002694, 'epoch': 4.06} [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2218, 'learning_rate': 0.00027, 'epoch': 4.07} [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.1004, 'learning_rate': 0.00027059999999999996, 'epoch': 4.08} [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:48:56,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.0669, 'learning_rate': 0.0002712, 'epoch': 4.09} 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████ | 454/1110 [2:51:17<4:41:14, 25.72s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.8998, 'learning_rate': 0.0002718, 'epoch': 4.1} 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.8966, 'learning_rate': 0.0002724, 'epoch': 4.11} 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▏ | 455/1110 [2:51:44<4:43:59, 26.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.6145, 'learning_rate': 0.0002736, 'epoch': 4.13} 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.5475, 'learning_rate': 0.0002742, 'epoch': 4.13} 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.5106, 'learning_rate': 0.0002742, 'epoch': 4.14} 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▎ | 457/1110 [2:52:33<4:33:40, 25.15s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:55:43,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:55:43,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.4872, 'learning_rate': 0.0002748, 'epoch': 4.15} [WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:55:47,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▋ | 462/1110 [2:54:27<4:08:29, 23.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▋ | 462/1110 [2:54:27<4:08:29, 23.01s/it]g-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:13,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:13,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:13,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:13,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:13,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:24,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:24,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:24,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:30,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:30,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.2508, 'learning_rate': 0.000276, 'epoch': 4.17} [WARNING|modeling_utils.py:388] 2022-03-28 19:56:30,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:42,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:42,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:45,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:48,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:48,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:56:48,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:56:52,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:56:54,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:56:57,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:56:57,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:57:00,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:57:00,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:04,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:06,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:16:48,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▊ | 465/1110 [2:55:24<3:35:18, 20.03s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:57:08,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:10,458 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:08,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:12,373 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:08,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:14,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:08,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:16,052 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:08,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:17,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:08,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:19,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:08,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:19,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:08,525 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▉ | 466/1110 [2:55:39<3:17:56, 18.44s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:57:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:24,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:26,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:29,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:30,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:30,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:35,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:35,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:23,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▉ | 467/1110 [2:55:52<3:01:49, 16.97s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:57:36,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:39,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:36,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:40,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:36,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:42,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:36,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:44,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:36,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:44,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:36,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:46,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:45,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:49,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:45,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:50,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:45,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:50,827 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:45,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████ | 469/1110 [2:56:08<2:11:32, 12.31s/it] Setting `use_cache=False`...1] 2022-03-28 19:57:45,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████ | 469/1110 [2:56:08<2:11:32, 12.31s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:57,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:57:57,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:00,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:00,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:04,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:04,443 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:07,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:11,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:11,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:15,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:15,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:18,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████▏ | 470/1110 [2:56:37<3:04:09, 17.26s/it] Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████▏ | 470/1110 [2:56:37<3:04:09, 17.26s/it] Setting `use_cache=False`...1] 2022-03-28 19:57:53,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████▏ | 470/1110 [2:56:37<3:04:09, 17.26s/it][WARNING|modeling_bart.py:1051] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:25,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:25,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:29,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:29,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:32,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:36,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:36,100 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:39,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:42,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:42,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 19:58:42,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████▏ | 471/1110 [2:57:05<3:37:23, 20.41s/it] Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████▏ | 471/1110 [2:57:05<3:37:23, 20.41s/it] Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6758, 'learning_rate': 0.0002808, 'epoch': 4.24} 42%|████████████████████████████████▏ | 471/1110 [2:57:05<3:37:23, 20.41s/it] Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|████████████████████████████████▏ | 471/1110 [2:57:05<3:37:23, 20.41s/it] Setting `use_cache=False`...1] 2022-03-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0751, 'learning_rate': 0.00028139999999999996, 'epoch': 4.25} [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7796, 'learning_rate': 0.00028199999999999997, 'epoch': 4.26} [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4917, 'learning_rate': 0.0002826, 'epoch': 4.27} [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 19:58:57,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2089, 'learning_rate': 0.00028319999999999994, 'epoch': 4.28} [WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:00:38,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.0105, 'learning_rate': 0.00028379999999999996, 'epoch': 4.29} 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.9742, 'learning_rate': 0.0002844, 'epoch': 4.3} 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.8292, 'learning_rate': 0.000285, 'epoch': 4.3} 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.7116, 'learning_rate': 0.00028559999999999995, 'epoch': 4.31} 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.5631, 'learning_rate': 0.00028619999999999996, 'epoch': 4.32} 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.5057, 'learning_rate': 0.0002868, 'epoch': 4.33} 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.4719, 'learning_rate': 0.00028739999999999994, 'epoch': 4.34} 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▌ | 476/1110 [2:59:21<4:32:02, 25.75s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.2309, 'learning_rate': 0.00028799999999999995, 'epoch': 4.35} [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.2296, 'learning_rate': 0.00028859999999999997, 'epoch': 4.36} [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.9529, 'learning_rate': 0.0002892, 'epoch': 4.37} [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:03:52,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:08,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:08,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:08,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:12,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:12,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:12,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:12,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:12,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:12,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:12,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:26,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:26,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:26,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:26,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.9794, 'learning_rate': 0.00029039999999999996, 'epoch': 4.39} [WARNING|modeling_utils.py:388] 2022-03-28 20:05:34,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:34,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:34,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:40,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:40,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:40,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:46,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:46,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▍ | 488/1110 [3:04:07<3:48:20, 22.03s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▍ | 488/1110 [3:04:07<3:48:20, 22.03s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:53,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:05:53,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:05:57,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:05:57,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:05:57,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:05:57,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:05,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:07,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:07,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:07,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:06:11,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:06:13,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:06:15,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:06:18,020 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:06:18,020 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:21,491 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:23,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:25,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:27,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:27,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:29,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:31,394 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:33,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:35,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:36,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:38,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:38,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:42,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:43,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:45,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:48,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:49,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:49,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:54,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:54,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:55,572 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:06:58,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:00,397 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:02,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:02,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:03,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:05,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:08,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:09,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:09,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:10,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:13,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:13,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:20,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:24,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:24,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:27,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:27,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:31,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:31,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:35,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:38,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:38,766 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.105, 'learning_rate': 0.00029519999999999997, 'epoch': 4.46} [WARNING|modeling_utils.py:388] 2022-03-28 20:07:42,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:42,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:45,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:45,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:49,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:52,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:52,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.909, 'learning_rate': 0.0002958, 'epoch': 4.47} [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.761, 'learning_rate': 0.0002964, 'epoch': 4.48} [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2873, 'learning_rate': 0.00029699999999999996, 'epoch': 4.48} [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:07:56,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.961, 'learning_rate': 0.00029759999999999997, 'epoch': 4.49} 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|██████████████████████████████████▏ | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/28/2022 20:15:36 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow {'eval_loss': 2.6299471855163574, 'eval_wer': 1.4451408171360571, 'eval_runtime': 336.2822, 'eval_samples_per_second': 7.856, 'eval_steps_per_second': 0.494, 'epoch': 4.5} [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/28/2022 20:16:48 - WARNING - huggingface_hub.repository - Adding files tracked by Git LFS: ['wandb/run-20220328_170142-by95ehra/run-by95ehra.wandb']. This may take a bit of time if the files are large. [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.3839, 'learning_rate': 0.0002988, 'epoch': 4.51} [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.2854, 'learning_rate': 0.00029939999999999996, 'epoch': 4.52} [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.1582, 'learning_rate': 0.0003, 'epoch': 4.53} [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.9157, 'learning_rate': 0.0002995081967213115, 'epoch': 4.54} [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.851, 'learning_rate': 0.0002990163934426229, 'epoch': 4.55} [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.8247, 'learning_rate': 0.00029852459016393437, 'epoch': 4.56} [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.7489, 'learning_rate': 0.00029803278688524587, 'epoch': 4.57} [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.6941, 'learning_rate': 0.00029754098360655737, 'epoch': 4.57} [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.513, 'learning_rate': 0.0002970491803278688, 'epoch': 4.58} [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.401, 'learning_rate': 0.0002965573770491803, 'epoch': 4.59} [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 20:09:59,807 >> Num examples = 2642 | 499/1110 [3:07:49<4:19:53, 25.52s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:21:46,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:21:46,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:21:50,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:21:50,610 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.3516, 'learning_rate': 0.0002960655737704918, 'epoch': 4.6} [WARNING|modeling_utils.py:388] 2022-03-28 20:21:54,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:21:54,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:21:58,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:21:58,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:21:58,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:21:58,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:21:58,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:22:09,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:22:09,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|███████████████████████████████████ | 512/1110 [3:20:29<4:17:01, 25.79s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|███████████████████████████████████ | 512/1110 [3:20:29<4:17:01, 25.79s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.27, 'learning_rate': 0.00029557377049180326, 'epoch': 4.61} 46%|███████████████████████████████████ | 512/1110 [3:20:29<4:17:01, 25.79s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|███████████████████████████████████ | 512/1110 [3:20:29<4:17:01, 25.79s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|███████████████████████████████████ | 512/1110 [3:20:29<4:17:01, 25.79s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|███████████████████████████████████ | 512/1110 [3:20:29<4:17:01, 25.79s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|███████████████████████████████████ | 512/1110 [3:20:29<4:17:01, 25.79s/it]g-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:22:27,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:22:27,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:22:27,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:22:33,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:22:33,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.2286, 'learning_rate': 0.0002950819672131147, 'epoch': 4.62} [WARNING|modeling_utils.py:388] 2022-03-28 20:22:33,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:22:39,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:22:39,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:22:44,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:22:44,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:22:48,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:22:48,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:22:52,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:22:52,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.1139, 'learning_rate': 0.0002945901639344262, 'epoch': 4.63} [WARNING|modeling_bart.py:1051] 2022-03-28 20:22:52,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:22:58,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:00,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:02,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:04,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:06,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:08,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 19:58:22,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|███████████████████████████████████▎ | 515/1110 [3:21:26<3:28:12, 21.00s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:23:10,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|███████████████████████████████████▎ | 515/1110 [3:21:26<3:28:12, 21.00s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:23:10,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:12,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:10,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:14,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:10,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:16,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:10,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:18,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:10,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:19,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:10,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:21,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:10,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|███████████████████████████████████▎ | 516/1110 [3:21:41<3:08:51, 19.08s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:23:25,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|███████████████████████████████████▎ | 516/1110 [3:21:41<3:08:51, 19.08s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:23:25,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:26,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:25,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:28,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:25,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:31,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:25,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:32,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:25,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:32,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:25,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:36,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:25,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 517/1110 [3:21:54<2:51:39, 17.37s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:23:38,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 517/1110 [3:21:54<2:51:39, 17.37s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:23:38,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:40,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:38,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:43,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:38,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:45,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:38,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 518/1110 [3:22:03<2:27:40, 14.97s/it] Setting `use_cache=False`...1] 2022-03-28 20:23:38,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▍ | 518/1110 [3:22:03<2:27:40, 14.97s/it] Setting `use_cache=False`...1] 2022-03-28 20:23:38,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:49,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:47,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:51,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:47,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:52,959 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:47,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▌ | 519/1110 [3:22:10<2:04:15, 12.62s/it] Setting `use_cache=False`...1] 2022-03-28 20:23:47,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▌ | 519/1110 [3:22:10<2:04:15, 12.62s/it] Setting `use_cache=False`...1] 2022-03-28 20:23:47,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▌ | 519/1110 [3:22:10<2:04:15, 12.62s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:59,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:23:59,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:03,106 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:03,106 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:06,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:06,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:10,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:10,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:13,960 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:17,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:17,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:21,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:21,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:21,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:23:55,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▌ | 520/1110 [3:22:40<2:53:03, 17.60s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▌ | 520/1110 [3:22:40<2:53:03, 17.60s/it][WARNING|modeling_bart.py:1051] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:28,304 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:31,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:31,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:35,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:35,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:38,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7027, 'learning_rate': 0.00029114754098360655, 'epoch': 4.69} [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6879, 'learning_rate': 0.000290655737704918, 'epoch': 4.7} [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.9378, 'learning_rate': 0.00029016393442622945, 'epoch': 4.71} [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.511, 'learning_rate': 0.00028967213114754095, 'epoch': 4.72} [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.1899, 'learning_rate': 0.00028918032786885245, 'epoch': 4.73} [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.9893, 'learning_rate': 0.0002886885245901639, 'epoch': 4.74} [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:24:42,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.7267, 'learning_rate': 0.0002881967213114754, 'epoch': 4.74} 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.5322, 'learning_rate': 0.00028770491803278684, 'epoch': 4.75} 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.4467, 'learning_rate': 0.00028721311475409834, 'epoch': 4.76} 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.3211, 'learning_rate': 0.0002867213114754098, 'epoch': 4.77} 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.1658, 'learning_rate': 0.0002862295081967213, 'epoch': 4.78} 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.1107, 'learning_rate': 0.0002857377049180328, 'epoch': 4.79} 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|████████████████████████████████████ | 527/1110 [3:25:50<4:11:02, 25.84s/it] Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:30:01,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:30:01,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:30:01,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.0426, 'learning_rate': 0.00028524590163934424, 'epoch': 4.8} [WARNING|modeling_bart.py:1051] 2022-03-28 20:30:01,845 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.1146, 'learning_rate': 0.0002847540983606557, 'epoch': 4.81} [WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:09,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:43,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:43,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:43,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:43,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:30:43,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9805, 'learning_rate': 0.0002837704918032787, 'epoch': 4.83} 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▋ | 535/1110 [3:29:06<3:49:04, 23.90s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:23,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:23,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:23,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:23,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:23,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:34,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:34,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:34,218 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:38,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:38,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:42,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:42,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:42,190 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:48,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:48,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:48,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:31:48,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▊ | 538/1110 [3:30:12<3:33:16, 22.37s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▊ | 538/1110 [3:30:12<3:33:16, 22.37s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.825, 'learning_rate': 0.00028278688524590163, 'epoch': 4.84} 48%|████████████████████████████████████▊ | 538/1110 [3:30:12<3:33:16, 22.37s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:02,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:04,619 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:04,619 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:04,619 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:10,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:12,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:15,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:15,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8417, 'learning_rate': 0.0002822950819672131, 'epoch': 4.85} [WARNING|modeling_utils.py:388] 2022-03-28 20:32:15,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:20,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:22,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:24,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:26,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:29,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:30,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:33,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:33,076 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:34,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:36,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:38,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:40,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:42,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:45,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:45,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:47,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:48,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:50,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:53,397 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:54,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:57,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:59,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:32:59,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:00,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:02,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:05,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:07,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:07,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:09,598 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:11,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:13,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:15,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:15,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:18,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:18,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:21,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:21,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:25,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:25,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:29,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:29,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:32,499 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:35,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:35,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:39,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:39,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:42,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:42,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:46,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:46,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:49,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:49,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:53,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:56,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:33:56,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:00,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:00,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:03,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4873, 'learning_rate': 0.0002788524590163934, 'epoch': 4.91} [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.9983, 'learning_rate': 0.00027836065573770487, 'epoch': 4.92} [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.6671, 'learning_rate': 0.00027786885245901637, 'epoch': 4.93} [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.535, 'learning_rate': 0.00027737704918032787, 'epoch': 4.94} [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.2795, 'learning_rate': 0.0002768852459016393, 'epoch': 4.95} [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.1159, 'learning_rate': 0.00027639344262295076, 'epoch': 4.96} [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:34:06,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:36:41,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:36:41,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9898, 'learning_rate': 0.00027590163934426227, 'epoch': 4.97} [WARNING|modeling_bart.py:1051] 2022-03-28 20:36:41,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:36:41,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:36:41,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:36:51,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:36:51,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:36:51,364 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:36:57,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:36:57,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:36:57,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:36:57,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8807, 'learning_rate': 0.00027540983606557377, 'epoch': 4.98} [WARNING|modeling_utils.py:388] 2022-03-28 20:37:04,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:06,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:08,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:10,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:12,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:14,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:16,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:16,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:18,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:19,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:22,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:25,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:27,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:28,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:28,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:31,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:32,602 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:35,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:35,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:38,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:38,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:42,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:42,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:45,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:49,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:49,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:53,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:53,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:53,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:37:53,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.9571, 'learning_rate': 0.00027393442622950816, 'epoch': 5.01} g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.5384, 'learning_rate': 0.00027344262295081966, 'epoch': 5.02} g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.2256, 'learning_rate': 0.0002729508196721311, 'epoch': 5.03} g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.0547, 'learning_rate': 0.0002724590163934426, 'epoch': 5.04} g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9737, 'learning_rate': 0.0002719672131147541, 'epoch': 5.04} g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8202, 'learning_rate': 0.00027147540983606556, 'epoch': 5.05} g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.7816, 'learning_rate': 0.000270983606557377, 'epoch': 5.06} g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6287, 'learning_rate': 0.00027, 'epoch': 5.08} 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5932, 'learning_rate': 0.00026950819672131145, 'epoch': 5.09} 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 563/1110 [3:39:28<3:58:59, 26.21s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5249, 'learning_rate': 0.0002690163934426229, 'epoch': 5.1} 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 566/1110 [3:40:44<3:52:11, 25.61s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 567/1110 [3:41:10<3:54:19, 25.89s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 567/1110 [3:41:10<3:54:19, 25.89s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5345, 'learning_rate': 0.0002685245901639344, 'epoch': 5.11} 51%|██████████████████████████████████████▊ | 567/1110 [3:41:10<3:54:19, 25.89s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 567/1110 [3:41:10<3:54:19, 25.89s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 567/1110 [3:41:10<3:54:19, 25.89s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 567/1110 [3:41:10<3:54:19, 25.89s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 567/1110 [3:41:10<3:54:19, 25.89s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▊ | 567/1110 [3:41:10<3:54:19, 25.89s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:43:11,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:43:11,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:43:11,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:43:11,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:43:11,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:43:11,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5351, 'learning_rate': 0.0002680327868852459, 'epoch': 5.12} [WARNING|modeling_utils.py:388] 2022-03-28 20:43:11,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4818, 'learning_rate': 0.00026754098360655734, 'epoch': 5.13} [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:43:25,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 570/1110 [3:42:22<3:39:57, 24.44s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 571/1110 [3:42:45<3:35:31, 23.99s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 571/1110 [3:42:45<3:35:31, 23.99s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4339, 'learning_rate': 0.00026655737704918035, 'epoch': 5.14} 51%|███████████████████████████████████████ | 571/1110 [3:42:45<3:35:31, 23.99s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:44:35,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:44:35,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:44:35,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:44:35,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:44:35,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:44:35,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:44:35,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:44:49,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:44:49,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:44:49,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4029, 'learning_rate': 0.0002660655737704918, 'epoch': 5.15} [WARNING|modeling_utils.py:388] 2022-03-28 20:44:49,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:44:57,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:44:57,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:44:57,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:44:57,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:44:57,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:44:57,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:45:09,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:45:09,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▏ | 573/1110 [3:43:30<3:28:41, 23.32s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▏ | 573/1110 [3:43:30<3:28:41, 23.32s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4651, 'learning_rate': 0.00026557377049180324, 'epoch': 5.16} 52%|███████████████████████████████████████▏ | 573/1110 [3:43:30<3:28:41, 23.32s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▏ | 573/1110 [3:43:30<3:28:41, 23.32s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▏ | 573/1110 [3:43:30<3:28:41, 23.32s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:45:24,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:45:24,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:45:28,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:45:28,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:45:28,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 574/1110 [3:43:50<3:20:12, 22.41s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 574/1110 [3:43:50<3:20:12, 22.41s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3721, 'learning_rate': 0.00026508196721311474, 'epoch': 5.17} [WARNING|modeling_bart.py:1051] 2022-03-28 20:45:38,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:45:40,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:45:40,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:45:44,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:45:44,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:45:48,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:45:51,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:45:51,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 575/1110 [3:44:09<3:09:43, 21.28s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:45:54,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:45:54,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:45:58,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:00,995 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:03,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:05,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:07,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:09,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:09,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:11,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:13,019 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:14,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:16,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:18,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:20,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:23,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:23,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:25,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:26,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:29,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:31,230 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:32,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:35,351 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:35,351 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:37,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:39,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:41,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:42,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:46,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:46,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:48,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:49,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:51,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:51,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:53,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:53,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:56,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:46:56,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:00,298 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:03,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:03,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:07,582 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:07,582 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:11,220 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:11,220 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:14,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:14,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:18,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:21,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:21,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.1772, 'learning_rate': 0.0002616393442622951, 'epoch': 5.23} [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:25,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:25,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:29,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:32,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:32,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:36,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:36,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:39,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:43,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:43,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:43,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:47:43,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.0168, 'learning_rate': 0.00026114754098360653, 'epoch': 5.24} 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.2479, 'learning_rate': 0.000260655737704918, 'epoch': 5.25} 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.2017, 'learning_rate': 0.0002601639344262295, 'epoch': 5.26} 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.1079, 'learning_rate': 0.000259672131147541, 'epoch': 5.27} 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8699, 'learning_rate': 0.0002591803278688524, 'epoch': 5.28} 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.7881, 'learning_rate': 0.0002586885245901639, 'epoch': 5.29} 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.7279, 'learning_rate': 0.00025819672131147537, 'epoch': 5.3} 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6572, 'learning_rate': 0.00025770491803278687, 'epoch': 5.3} 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 582/1110 [3:46:07<3:01:13, 20.59s/it] Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6395, 'learning_rate': 0.0002572131147540983, 'epoch': 5.31} [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5414, 'learning_rate': 0.0002567213114754098, 'epoch': 5.32} [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5121, 'learning_rate': 0.0002562295081967213, 'epoch': 5.33} [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:51:24,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4897, 'learning_rate': 0.00025573770491803277, 'epoch': 5.34} g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4312, 'learning_rate': 0.0002552459016393442, 'epoch': 5.35} g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4397, 'learning_rate': 0.0002547540983606557, 'epoch': 5.36} g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3896, 'learning_rate': 0.0002542622950819672, 'epoch': 5.37} g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:01,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:01,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:01,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:01,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:09,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:09,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:09,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3908, 'learning_rate': 0.00025377049180327866, 'epoch': 5.38} [WARNING|modeling_utils.py:388] 2022-03-28 20:54:09,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:18,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:18,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:18,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:18,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:18,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:30,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:30,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:30,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3858, 'learning_rate': 0.00025327868852459016, 'epoch': 5.39} [WARNING|modeling_utils.py:388] 2022-03-28 20:54:30,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:30,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:30,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:30,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:44,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:44,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:44,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:50,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:50,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:50,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3354, 'learning_rate': 0.0002527868852459016, 'epoch': 5.39} [WARNING|modeling_utils.py:388] 2022-03-28 20:54:57,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:57,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:54:57,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:03,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:03,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:55:07,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:55:07,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:11,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:11,561 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3314, 'learning_rate': 0.0002522950819672131, 'epoch': 5.4} [WARNING|modeling_bart.py:1051] 2022-03-28 20:55:15,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 20:55:15,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:19,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:21,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:23,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:25,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:27,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:29,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:29,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:31,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:33,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:35,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:37,451 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:39,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:40,982 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:42,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:42,692 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:44,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:47,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:49,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:52,130 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:53,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:56,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:56,286 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:55:57,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:00,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:02,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:02,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:05,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:05,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:07,894 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:09,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:12,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:13,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:13,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3007, 'learning_rate': 0.00024983606557377045, 'epoch': 5.45} [WARNING|modeling_utils.py:388] 2022-03-28 20:56:17,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:17,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:21,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:21,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:24,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:28,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:28,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:32,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:32,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:35,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:35,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:39,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:42,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:42,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.9302, 'learning_rate': 0.00024934426229508195, 'epoch': 5.46} [WARNING|modeling_utils.py:388] 2022-03-28 20:56:46,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:46,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:49,793 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:53,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:53,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:56,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:56:56,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:00,156 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.4471, 'learning_rate': 0.0002488524590163934, 'epoch': 5.47} [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.0509, 'learning_rate': 0.0002483606557377049, 'epoch': 5.48} [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 20:57:03,611 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9157, 'learning_rate': 0.0002478688524590164, 'epoch': 5.48} g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8162, 'learning_rate': 0.00024737704918032785, 'epoch': 5.49} g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6961, 'learning_rate': 0.0002468852459016393, 'epoch': 5.5} g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6048, 'learning_rate': 0.0002463934426229508, 'epoch': 5.51} g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6278, 'learning_rate': 0.0002459016393442623, 'epoch': 5.52} g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5126, 'learning_rate': 0.00024540983606557374, 'epoch': 5.53} 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4451, 'learning_rate': 0.0002449180327868852, 'epoch': 5.54} 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4496, 'learning_rate': 0.0002444262295081967, 'epoch': 5.55} 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3534, 'learning_rate': 0.00024393442622950816, 'epoch': 5.56} 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3675, 'learning_rate': 0.00024344262295081966, 'epoch': 5.57} 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 614/1110 [3:58:34<3:33:02, 25.77s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3711, 'learning_rate': 0.0002424590163934426, 'epoch': 5.58} 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3521, 'learning_rate': 0.00024196721311475406, 'epoch': 5.59} 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3142, 'learning_rate': 0.00024147540983606556, 'epoch': 5.6} 56%|██████████████████████████████████████████▍ | 619/1110 [4:00:39<3:22:57, 24.80s/it]g-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:03:36,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:03:36,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:03:40,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:03:40,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:03:40,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:03:40,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:03:40,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:03:40,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:03:52,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:03:52,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:03:52,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:03:57,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:03:57,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:03:57,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:03:57,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:03:57,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:04:07,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:04:07,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:04:07,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:04:13,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:04:13,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:04:13,445 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2855, 'learning_rate': 0.0002404918032786885, 'epoch': 5.62} [WARNING|modeling_utils.py:388] 2022-03-28 21:04:19,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:04:19,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:04:19,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:04:25,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:04:25,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:04:29,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:04:29,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:04:33,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:04:33,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3093, 'learning_rate': 0.00023999999999999998, 'epoch': 5.63} [WARNING|modeling_bart.py:1051] 2022-03-28 21:04:38,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:04:40,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:04:40,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:04:40,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:04:45,567 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:04:47,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:04:49,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 20:24:24,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▊ | 626/1110 [4:03:07<2:42:50, 20.19s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:04:51,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▊ | 626/1110 [4:03:07<2:42:50, 20.19s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:04:51,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:04:53,746 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:04:51,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:04:55,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:04:51,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:04:57,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:04:51,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:04:59,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:04:51,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:01,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:04:51,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:02,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:04:51,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▉ | 627/1110 [4:03:22<2:29:13, 18.54s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:05:06,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▉ | 627/1110 [4:03:22<2:29:13, 18.54s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:05:06,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:07,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:06,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:09,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:06,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:12,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:06,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:14,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:06,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:15,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:06,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 628/1110 [4:03:34<2:13:26, 16.61s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:05:18,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 628/1110 [4:03:34<2:13:26, 16.61s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:05:18,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:19,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:18,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:22,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:18,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:24,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:18,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:24,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:18,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 21:05:18,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:30,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:29,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:30,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:29,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:32,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:29,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:34,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:29,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▏ | 630/1110 [4:03:52<1:40:30, 12.56s/it] Setting `use_cache=False`...1] 2022-03-28 21:05:29,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▏ | 630/1110 [4:03:52<1:40:30, 12.56s/it] Setting `use_cache=False`...1] 2022-03-28 21:05:29,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▏ | 630/1110 [4:03:52<1:40:30, 12.56s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:40,967 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:40,967 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:44,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:44,601 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:48,225 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:48,225 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:51,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:51,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:55,413 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:58,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:05:58,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:02,479 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▏ | 631/1110 [4:04:21<2:19:49, 17.51s/it] Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▏ | 631/1110 [4:04:21<2:19:49, 17.51s/it] Setting `use_cache=False`...1] 2022-03-28 21:05:37,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▏ | 631/1110 [4:04:21<2:19:49, 17.51s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▏ | 631/1110 [4:04:21<2:19:49, 17.51s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:09,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:13,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:13,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:16,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:16,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.832, 'learning_rate': 0.0002365573770491803, 'epoch': 5.69} [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:06:20,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6695, 'learning_rate': 0.00023606557377049177, 'epoch': 5.7} 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6161, 'learning_rate': 0.00023557377049180327, 'epoch': 5.71} 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5685, 'learning_rate': 0.00023508196721311474, 'epoch': 5.72} 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5094, 'learning_rate': 0.00023459016393442622, 'epoch': 5.73} 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4618, 'learning_rate': 0.00023409836065573766, 'epoch': 5.74} 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4221, 'learning_rate': 0.00023360655737704916, 'epoch': 5.74} 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3998, 'learning_rate': 0.00023311475409836064, 'epoch': 5.75} 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.369, 'learning_rate': 0.0002326229508196721, 'epoch': 5.76} 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3503, 'learning_rate': 0.00023213114754098358, 'epoch': 5.77} 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3426, 'learning_rate': 0.00023163934426229506, 'epoch': 5.78} 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 633/1110 [4:05:16<3:00:07, 22.66s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3081, 'learning_rate': 0.00023114754098360653, 'epoch': 5.79} 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.321, 'learning_rate': 0.000230655737704918, 'epoch': 5.8} 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 643/1110 [4:09:37<3:16:22, 25.23s/it] Setting `use_cache=False`...1] 2022-03-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:11:51,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:11:51,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:11:51,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:11:51,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:11:51,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:11:51,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:11:51,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:11:51,656 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3081, 'learning_rate': 0.00023016393442622948, 'epoch': 5.81} g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████▏ | 646/1110 [4:10:46<3:04:34, 23.87s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████▏ | 646/1110 [4:10:46<3:04:34, 23.87s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3378, 'learning_rate': 0.00022967213114754098, 'epoch': 5.82} 58%|████████████████████████████████████████████▏ | 646/1110 [4:10:46<3:04:34, 23.87s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:12:36,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:12:36,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:12:36,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:12:42,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:12:42,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:12:42,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:12:42,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:12:42,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████▎ | 647/1110 [4:11:08<2:59:45, 23.29s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████▎ | 647/1110 [4:11:08<2:59:45, 23.29s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2946, 'learning_rate': 0.00022918032786885245, 'epoch': 5.83} 58%|████████████████████████████████████████████▎ | 647/1110 [4:11:08<2:59:45, 23.29s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:12:59,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:12:59,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:12:59,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:12:59,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:12:59,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:12:59,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:13:11,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:13:11,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████▎ | 648/1110 [4:11:31<2:58:29, 23.18s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████▎ | 648/1110 [4:11:31<2:58:29, 23.18s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3051, 'learning_rate': 0.0002286885245901639, 'epoch': 5.83} 58%|████████████████████████████████████████████▎ | 648/1110 [4:11:31<2:58:29, 23.18s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████▎ | 648/1110 [4:11:31<2:58:29, 23.18s/it]g-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:13:23,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:13:23,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:13:27,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:13:27,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:13:27,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:13:33,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████▍ | 649/1110 [4:11:51<2:51:05, 22.27s/it] Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████▍ | 649/1110 [4:11:51<2:51:05, 22.27s/it] Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2399, 'learning_rate': 0.00022819672131147537, 'epoch': 5.84} [WARNING|modeling_bart.py:1051] 2022-03-28 21:13:39,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:13:42,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:13:42,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:13:42,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:13:48,002 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:13:50,304 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:13:52,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:13:52,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 650/1110 [4:12:10<2:42:34, 21.20s/it] Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 650/1110 [4:12:10<2:42:34, 21.20s/it] Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:13:58,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:00,233 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:02,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:04,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:06,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:08,519 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:10,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:10,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:12,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:14,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:16,211 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:18,009 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:19,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:21,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:24,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:24,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:26,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:28,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:31,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:32,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:35,272 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:36,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:36,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:39,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:40,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:42,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:43,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:47,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:47,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:49,189 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:50,989 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:53,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:53,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:54,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:57,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:14:57,578 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:01,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:01,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:04,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:04,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:08,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:11,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:11,749 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:15,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:15,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:18,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:22,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:22,058 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8577, 'learning_rate': 0.00022475409836065572, 'epoch': 5.91} [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:25,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:25,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:29,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:32,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:32,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:35,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:35,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:39,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:42,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:42,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6569, 'learning_rate': 0.0002242622950819672, 'epoch': 5.91} [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5387, 'learning_rate': 0.0002237704918032787, 'epoch': 5.92} [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4774, 'learning_rate': 0.00022327868852459014, 'epoch': 5.93} [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4175, 'learning_rate': 0.0002227868852459016, 'epoch': 5.94} [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3658, 'learning_rate': 0.00022229508196721309, 'epoch': 5.95} [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:15:45,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:17:49,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:17:49,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:17:49,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3594, 'learning_rate': 0.00022180327868852459, 'epoch': 5.96} [WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:17:55,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:06:06,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▍ | 663/1110 [4:16:36<2:55:39, 23.58s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▍ | 663/1110 [4:16:36<2:55:39, 23.58s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3114, 'learning_rate': 0.00022131147540983606, 'epoch': 5.97} 60%|█████████████████████████████████████████████▍ | 663/1110 [4:16:36<2:55:39, 23.58s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:18:26,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:18:26,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:18:26,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:18:26,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:18:34,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:18:34,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:18:38,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:18:38,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:18:38,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2732, 'learning_rate': 0.0002208196721311475, 'epoch': 5.98} [WARNING|modeling_utils.py:388] 2022-03-28 21:18:44,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:18:46,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:18:46,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:18:50,738 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:18:52,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:18:54,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:18:56,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:18:56,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:18:58,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:18:59,896 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:01,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:04,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:06,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:08,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:08,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:09,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:12,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:12,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:16,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:16,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:20,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:20,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:23,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:27,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:27,411 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6729, 'learning_rate': 0.00021934426229508195, 'epoch': 6.01} [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:19:30,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4497, 'learning_rate': 0.00021885245901639343, 'epoch': 6.02} 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3901, 'learning_rate': 0.0002183606557377049, 'epoch': 6.03} 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3872, 'learning_rate': 0.00021786885245901638, 'epoch': 6.04} 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3052, 'learning_rate': 0.00021737704918032785, 'epoch': 6.04} 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2726, 'learning_rate': 0.00021688524590163932, 'epoch': 6.05} 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2653, 'learning_rate': 0.0002163934426229508, 'epoch': 6.06} 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2387, 'learning_rate': 0.0002159016393442623, 'epoch': 6.07} 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2052, 'learning_rate': 0.00021540983606557374, 'epoch': 6.08} 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2342, 'learning_rate': 0.00021491803278688522, 'epoch': 6.09} 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1993, 'learning_rate': 0.0002144262295081967, 'epoch': 6.1} 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 60%|█████████████████████████████████████████████▋ | 668/1110 [4:18:26<2:59:28, 24.36s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1983, 'learning_rate': 0.0002139344262295082, 'epoch': 6.11} 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1869, 'learning_rate': 0.00021344262295081967, 'epoch': 6.12} 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1725, 'learning_rate': 0.00021295081967213114, 'epoch': 6.13} 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 61%|██████████████████████████████████████████████▍ | 678/1110 [4:22:48<3:01:58, 25.28s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:38,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:38,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:38,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:38,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:38,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1686, 'learning_rate': 0.00021245901639344259, 'epoch': 6.13} [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1684, 'learning_rate': 0.0002119672131147541, 'epoch': 6.14} [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:25:48,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1432, 'learning_rate': 0.00021147540983606556, 'epoch': 6.15} [WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:26:29,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:26:47,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:26:47,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▊ | 684/1110 [4:25:08<2:42:04, 22.83s/it]g-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▊ | 684/1110 [4:25:08<2:42:04, 22.83s/it]g-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1873, 'learning_rate': 0.00021098360655737703, 'epoch': 6.16} 62%|██████████████████████████████████████████████▊ | 684/1110 [4:25:08<2:42:04, 22.83s/it]g-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:26:58,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:26:58,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:26:58,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:26:58,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:06,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:06,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:06,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:12,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:12,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:12,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1583, 'learning_rate': 0.0002104918032786885, 'epoch': 6.17} [WARNING|modeling_utils.py:388] 2022-03-28 21:27:12,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:19,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:22,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:22,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:27:26,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:27:26,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:30,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:30,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:30,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1472, 'learning_rate': 0.00020999999999999998, 'epoch': 6.18} [WARNING|modeling_utils.py:388] 2022-03-28 21:27:36,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:38,770 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:40,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:43,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:45,197 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:47,244 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:49,280 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:27:49,280 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1488, 'learning_rate': 0.00020950819672131146, 'epoch': 6.19} [WARNING|modeling_bart.py:1051] 2022-03-28 21:27:53,047 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:27:54,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:27:56,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:27:58,611 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:00,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:02,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:02,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████ | 688/1110 [4:26:21<2:11:05, 18.64s/it] Setting `use_cache=False`...e computed-28 21:18:20,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:07,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:05,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:08,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:05,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:10,371 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:05,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:13,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:05,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:14,770 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:05,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:14,770 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:05,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▏ | 689/1110 [4:26:33<1:56:59, 16.67s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:28:17,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:18,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:17,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:21,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:17,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:23,592 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:17,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:25,719 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:17,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:25,719 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:17,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▏ | 690/1110 [4:26:43<1:41:23, 14.48s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:28:26,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:29,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:26,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:31,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:26,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:33,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:26,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:33,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:26,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 691/1110 [4:26:51<1:28:28, 12.67s/it] Setting `use_cache=False`...1] 2022-03-28 21:28:26,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 691/1110 [4:26:51<1:28:28, 12.67s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 691/1110 [4:26:51<1:28:28, 12.67s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:40,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:43,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:43,730 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:47,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:47,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:50,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:50,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:54,521 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:58,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:28:58,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:01,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▍ | 692/1110 [4:27:20<2:02:39, 17.61s/it] Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▍ | 692/1110 [4:27:20<2:02:39, 17.61s/it] Setting `use_cache=False`...1] 2022-03-28 21:28:36,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▍ | 692/1110 [4:27:20<2:02:39, 17.61s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▍ | 692/1110 [4:27:20<2:02:39, 17.61s/it][WARNING|modeling_bart.py:1051] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:08,748 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:12,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:12,224 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:15,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:15,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:19,178 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6188, 'learning_rate': 0.0002065573770491803, 'epoch': 6.24} [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.423, 'learning_rate': 0.0002060655737704918, 'epoch': 6.25} [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3769, 'learning_rate': 0.00020557377049180327, 'epoch': 6.26} [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.376, 'learning_rate': 0.00020508196721311475, 'epoch': 6.27} [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2984, 'learning_rate': 0.0002045901639344262, 'epoch': 6.28} [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:29:22,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▊ | 698/1110 [4:30:04<2:58:30, 26.00s/it] Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▊ | 698/1110 [4:30:04<2:58:30, 26.00s/it] Setting `use_cache=False`...1] 2022-03-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2631, 'learning_rate': 0.0002040983606557377, 'epoch': 6.29} [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.254, 'learning_rate': 0.00020360655737704917, 'epoch': 6.3} [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.252, 'learning_rate': 0.00020311475409836064, 'epoch': 6.3} [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:31:52,886 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2213, 'learning_rate': 0.00020262295081967211, 'epoch': 6.31} 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2044, 'learning_rate': 0.00020213114754098356, 'epoch': 6.32} 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1789, 'learning_rate': 0.00020163934426229506, 'epoch': 6.33} 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2126, 'learning_rate': 0.00020114754098360653, 'epoch': 6.34} 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1658, 'learning_rate': 0.000200655737704918, 'epoch': 6.35} 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 701/1110 [4:31:21<2:55:01, 25.68s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:00,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:00,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:00,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:00,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:00,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1732, 'learning_rate': 0.0002001639344262295, 'epoch': 6.36} [WARNING|modeling_utils.py:388] 2022-03-28 21:35:00,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:00,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:00,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:35:17,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:35:17,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:35:17,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1748, 'learning_rate': 0.00019967213114754098, 'epoch': 6.37} [WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:23,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:46,351 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:46,351 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1638, 'learning_rate': 0.00019918032786885243, 'epoch': 6.38} [WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:35:50,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:12,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:12,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.157, 'learning_rate': 0.0001986885245901639, 'epoch': 6.39} [WARNING|modeling_utils.py:388] 2022-03-28 21:36:16,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:16,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:16,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:23,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:23,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:23,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:23,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:31,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:31,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:31,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:31,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:37,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:37,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:37,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:43,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:43,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:36:47,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:36:47,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:51,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:54,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:54,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:36:54,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1552, 'learning_rate': 0.00019770491803278688, 'epoch': 6.4} [WARNING|modeling_utils.py:388] 2022-03-28 21:36:59,857 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:01,997 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:04,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:06,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:08,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:10,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:12,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:12,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:14,375 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:16,248 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:18,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:19,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:21,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:23,490 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:25,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:25,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:26,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:30,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:31,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:34,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:35,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:38,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:38,709 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:40,055 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:42,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:44,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:46,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:48,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:48,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:50,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:52,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:54,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:56,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:37:56,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1373, 'learning_rate': 0.00019524590163934425, 'epoch': 6.45} [WARNING|modeling_utils.py:388] 2022-03-28 21:38:00,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:00,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:03,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:03,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:07,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:07,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:10,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:14,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:14,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:17,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:17,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:21,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:25,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:25,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5316, 'learning_rate': 0.00019475409836065572, 'epoch': 6.46} [WARNING|modeling_utils.py:388] 2022-03-28 21:38:28,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:28,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:32,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:32,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:35,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:39,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:39,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4373, 'learning_rate': 0.00019426229508196722, 'epoch': 6.47} [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3649, 'learning_rate': 0.00019377049180327867, 'epoch': 6.48} [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3233, 'learning_rate': 0.00019327868852459014, 'epoch': 6.48} [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2931, 'learning_rate': 0.00019278688524590161, 'epoch': 6.49} [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2771, 'learning_rate': 0.00019229508196721312, 'epoch': 6.5} [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2295, 'learning_rate': 0.0001918032786885246, 'epoch': 6.51} [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2316, 'learning_rate': 0.00019131147540983604, 'epoch': 6.52} [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2157, 'learning_rate': 0.0001908196721311475, 'epoch': 6.53} [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2125, 'learning_rate': 0.000190327868852459, 'epoch': 6.54} [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1578, 'learning_rate': 0.00018983606557377048, 'epoch': 6.55} [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1633, 'learning_rate': 0.00018934426229508196, 'epoch': 6.56} [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1553, 'learning_rate': 0.00018885245901639343, 'epoch': 6.57} [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:38:42,671 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1544, 'learning_rate': 0.00018836065573770488, 'epoch': 6.57} 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|█████████████████████████████████████████████████▉ | 730/1110 [4:42:21<2:38:13, 24.98s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:44:32,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:44:32,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:44:32,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:44:38,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:44:38,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:44:38,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:44:38,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:44:38,503 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1519, 'learning_rate': 0.00018737704918032785, 'epoch': 6.59} [WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:44:48,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:09,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:09,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:13,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:13,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1557, 'learning_rate': 0.00018688524590163933, 'epoch': 6.6} [WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1656, 'learning_rate': 0.00018639344262295083, 'epoch': 6.61} [WARNING|modeling_utils.py:388] 2022-03-28 21:45:17,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:39,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:39,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:39,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:39,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:39,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:39,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:39,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:45:39,869 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:45:56,298 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:45:56,298 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1322, 'learning_rate': 0.00018590163934426227, 'epoch': 6.62} [WARNING|modeling_bart.py:1051] 2022-03-28 21:45:56,298 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:46:02,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:46:02,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:06,516 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:06,516 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:46:10,746 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:46:10,746 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:14,663 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:14,663 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:14,663 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:46:18,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:46:20,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:46:20,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:24,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:26,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:28,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:30,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:32,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:32,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:34,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:36,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:38,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:40,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:41,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:45,270 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:46,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:46,929 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:48,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:50,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:53,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:54,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:57,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:58,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:46:58,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:01,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:03,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:05,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:05,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:07,680 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:09,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:11,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:12,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:15,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:15,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:16,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:16,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:20,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:23,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:23,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:27,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:27,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:30,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:30,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:34,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:38,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:38,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:41,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:41,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:41,546 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:45,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:45,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:48,628 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:52,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:52,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:55,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:55,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:47:59,042 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:48:02,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:48:02,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:48:06,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:48:06,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:48:06,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4009, 'learning_rate': 0.0001819672131147541, 'epoch': 6.69} 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3291, 'learning_rate': 0.00018147540983606556, 'epoch': 6.7} 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2757, 'learning_rate': 0.00018098360655737704, 'epoch': 6.71} 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2992, 'learning_rate': 0.00018049180327868848, 'epoch': 6.72} 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2224, 'learning_rate': 0.00017999999999999998, 'epoch': 6.73} 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2268, 'learning_rate': 0.00017950819672131146, 'epoch': 6.74} 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2091, 'learning_rate': 0.00017901639344262293, 'epoch': 6.74} 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2108, 'learning_rate': 0.00017852459016393443, 'epoch': 6.75} 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1898, 'learning_rate': 0.00017803278688524588, 'epoch': 6.76} 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1764, 'learning_rate': 0.00017754098360655735, 'epoch': 6.77} 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1669, 'learning_rate': 0.00017704918032786883, 'epoch': 6.78} 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1502, 'learning_rate': 0.00017655737704918033, 'epoch': 6.79} 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▊ | 743/1110 [4:46:29<2:05:40, 20.55s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1734, 'learning_rate': 0.0001760655737704918, 'epoch': 6.8} 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▋ | 755/1110 [4:51:40<2:27:19, 24.90s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:53:47,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:53:47,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1519, 'learning_rate': 0.00017557377049180327, 'epoch': 6.81} [WARNING|modeling_bart.py:1051] 2022-03-28 21:53:47,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:53:47,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:53:47,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:53:47,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1561, 'learning_rate': 0.00017508196721311472, 'epoch': 6.82} [WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:53:59,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:25,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:25,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:25,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:25,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:25,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1471, 'learning_rate': 0.0001745901639344262, 'epoch': 6.83} [WARNING|modeling_utils.py:388] 2022-03-28 21:54:25,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:25,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:25,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:41,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:41,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:41,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:41,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:41,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:41,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▉ | 759/1110 [4:53:10<2:13:13, 22.77s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▉ | 759/1110 [4:53:10<2:13:13, 22.77s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:54:56,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:55:12,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:55:12,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|████████████████████████████████████████████████████ | 760/1110 [4:53:32<2:12:00, 22.63s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|████████████████████████████████████████████████████ | 760/1110 [4:53:32<2:12:00, 22.63s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1161, 'learning_rate': 0.00017360655737704917, 'epoch': 6.84} [WARNING|modeling_bart.py:1051] 2022-03-28 21:55:20,662 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:55:23,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:55:23,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:55:26,999 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:55:26,999 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:55:26,999 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:55:32,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:55:35,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:55:35,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1166, 'learning_rate': 0.00017311475409836064, 'epoch': 6.85} [WARNING|modeling_bart.py:1051] 2022-03-28 21:55:39,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:55:41,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:55:43,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:55:45,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:55:47,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:55:49,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:55:51,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:55:51,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:55:53,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:55:53,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:55:56,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:55:58,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:00,423 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:02,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:03,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:07,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:07,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:08,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:10,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:13,555 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:14,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:16,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:19,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:19,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:20,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:23,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:25,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:27,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:27,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:29,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:31,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:33,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:35,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:36,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:36,098 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:38,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:38,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:42,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:42,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:45,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:49,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:49,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:52,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:52,884 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:56,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:59,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:56:59,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:03,263 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:03,263 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:03,263 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:06,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:06,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:10,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:13,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:13,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:17,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:20,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:20,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3025, 'learning_rate': 0.00016967213114754096, 'epoch': 6.91} [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2683, 'learning_rate': 0.00016918032786885243, 'epoch': 6.92} [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2023, 'learning_rate': 0.00016868852459016393, 'epoch': 6.93} [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.184, 'learning_rate': 0.0001681967213114754, 'epoch': 6.94} [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1605, 'learning_rate': 0.00016770491803278688, 'epoch': 6.95} [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:57:23,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:59:31,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:59:31,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:59:31,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:59:31,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:59:31,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:59:31,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.142, 'learning_rate': 0.00016721311475409833, 'epoch': 6.96} [WARNING|modeling_utils.py:388] 2022-03-28 21:59:31,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 21:59:31,998 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:59:48,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:59:48,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:59:52,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:59:52,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:59:56,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 21:59:56,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:00:00,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:00:00,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1362, 'learning_rate': 0.0001667213114754098, 'epoch': 6.97} [WARNING|modeling_bart.py:1051] 2022-03-28 22:00:00,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:00:00,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:00:00,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:00:11,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:00:11,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:00:11,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:16,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:16,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:00:21,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:00:21,046 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1242, 'learning_rate': 0.0001662295081967213, 'epoch': 6.98} [WARNING|modeling_utils.py:388] 2022-03-28 22:00:25,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:27,387 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:29,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:31,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:33,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:35,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:37,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:37,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:39,154 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:40,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:43,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:45,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:47,639 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:49,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:49,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:51,653 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:52,449 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:54,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:54,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:58,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:00:58,701 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:02,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:05,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:05,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:09,449 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:09,449 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3292, 'learning_rate': 0.00016475409836065575, 'epoch': 7.01} [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1917, 'learning_rate': 0.0001642622950819672, 'epoch': 7.02} [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1808, 'learning_rate': 0.00016377049180327867, 'epoch': 7.03} [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:01:13,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1459, 'learning_rate': 0.00016327868852459014, 'epoch': 7.04} 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1363, 'learning_rate': 0.00016278688524590164, 'epoch': 7.04} 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▍ | 781/1110 [5:01:02<2:21:54, 25.88s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1299, 'learning_rate': 0.00016229508196721312, 'epoch': 7.05} 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1337, 'learning_rate': 0.00016180327868852456, 'epoch': 7.06} 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1175, 'learning_rate': 0.00016131147540983604, 'epoch': 7.07} 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1139, 'learning_rate': 0.00016081967213114754, 'epoch': 7.08} 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1132, 'learning_rate': 0.000160327868852459, 'epoch': 7.09} 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0938, 'learning_rate': 0.00015983606557377049, 'epoch': 7.1} 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0853, 'learning_rate': 0.00015934426229508193, 'epoch': 7.11} 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|█████████████████████████████████████████████████████▌ | 783/1110 [5:01:56<2:22:57, 26.23s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0911, 'learning_rate': 0.0001588524590163934, 'epoch': 7.12} 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0876, 'learning_rate': 0.0001583606557377049, 'epoch': 7.13} 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████ | 790/1110 [5:04:54<2:12:51, 24.91s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▏ | 792/1110 [5:05:43<2:10:13, 24.57s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▏ | 792/1110 [5:05:43<2:10:13, 24.57s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0952, 'learning_rate': 0.00015786885245901638, 'epoch': 7.13} 71%|██████████████████████████████████████████████████████▏ | 792/1110 [5:05:43<2:10:13, 24.57s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▏ | 792/1110 [5:05:43<2:10:13, 24.57s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:07:35,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:07:35,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:07:35,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:07:35,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:07:35,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:07:35,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:07:35,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 793/1110 [5:06:06<2:06:56, 24.03s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 793/1110 [5:06:06<2:06:56, 24.03s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0965, 'learning_rate': 0.00015737704918032785, 'epoch': 7.14} [WARNING|modeling_utils.py:388] 2022-03-28 22:07:53,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:07:53,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:07:53,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:07:53,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:02,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:02,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:02,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:02,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:02,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▎ | 794/1110 [5:06:28<2:03:15, 23.40s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▎ | 794/1110 [5:06:28<2:03:15, 23.40s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:14,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:14,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:14,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:14,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:14,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:24,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:24,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:28,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:28,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▍ | 795/1110 [5:06:49<1:59:10, 22.70s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▍ | 795/1110 [5:06:49<1:59:10, 22.70s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0931, 'learning_rate': 0.0001563934426229508, 'epoch': 7.16} 72%|██████████████████████████████████████████████████████▍ | 795/1110 [5:06:49<1:59:10, 22.70s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▍ | 795/1110 [5:06:49<1:59:10, 22.70s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▍ | 795/1110 [5:06:49<1:59:10, 22.70s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▍ | 795/1110 [5:06:49<1:59:10, 22.70s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▍ | 795/1110 [5:06:49<1:59:10, 22.70s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:46,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:46,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:46,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:52,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:08:52,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0801, 'learning_rate': 0.00015590163934426228, 'epoch': 7.17} [WARNING|modeling_bart.py:1051] 2022-03-28 22:08:57,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:08:57,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:01,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:01,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:01,312 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:09:07,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:09:07,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:11,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:13,496 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:13,496 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.081, 'learning_rate': 0.00015540983606557375, 'epoch': 7.18} [WARNING|modeling_bart.py:1051] 2022-03-28 22:09:17,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:09:19,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:09:21,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:09:23,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:09:25,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:09:25,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:29,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:29,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:31,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:33,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:35,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:36,906 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:38,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:40,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:43,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:43,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:45,508 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:47,096 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:50,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:51,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:53,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:55,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:55,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:58,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:09:59,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:01,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:03,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:03,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:06,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:07,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:10,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:12,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:12,080 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:14,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:14,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:18,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:18,128 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:21,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:21,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:25,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:28,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:28,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:32,472 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:32,472 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:32,472 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:37,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:41,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:41,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:41,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:45,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:45,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:48,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:48,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:52,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:55,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:55,527 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:58,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:10:58,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:02,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2267, 'learning_rate': 0.0001519672131147541, 'epoch': 7.24} [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:11:05,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████ | 805/1110 [5:09:54<1:55:25, 22.71s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.18, 'learning_rate': 0.000150983606557377, 'epoch': 7.26} 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▏ | 806/1110 [5:10:22<2:01:42, 24.02s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1531, 'learning_rate': 0.0001504918032786885, 'epoch': 7.27} 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1334, 'learning_rate': 0.00015, 'epoch': 7.28} 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1211, 'learning_rate': 0.00014950819672131146, 'epoch': 7.29} 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1199, 'learning_rate': 0.00014901639344262293, 'epoch': 7.3} 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1123, 'learning_rate': 0.0001485245901639344, 'epoch': 7.3} 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1059, 'learning_rate': 0.0001480327868852459, 'epoch': 7.31} 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 807/1110 [5:10:48<2:05:18, 24.81s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0931, 'learning_rate': 0.00014754098360655736, 'epoch': 7.32} 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0969, 'learning_rate': 0.00014704918032786886, 'epoch': 7.33} 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0967, 'learning_rate': 0.0001465573770491803, 'epoch': 7.34} 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▋ | 813/1110 [5:13:24<2:05:50, 25.42s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:16:12,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:16:12,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:16:12,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:16:12,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:16:12,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0934, 'learning_rate': 0.00014606557377049178, 'epoch': 7.35} 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0883, 'learning_rate': 0.00014557377049180328, 'epoch': 7.36} 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▊ | 816/1110 [5:14:38<2:03:10, 25.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:17:03,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:17:03,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:17:03,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|████████████████████████████████████████████████████████ | 818/1110 [5:15:25<1:57:30, 24.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|████████████████████████████████████████████████████████ | 818/1110 [5:15:25<1:57:30, 24.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0828, 'learning_rate': 0.00014508196721311472, 'epoch': 7.37} 74%|████████████████████████████████████████████████████████ | 818/1110 [5:15:25<1:57:30, 24.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|████████████████████████████████████████████████████████ | 818/1110 [5:15:25<1:57:30, 24.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|████████████████████████████████████████████████████████ | 818/1110 [5:15:25<1:57:30, 24.14s/it]g-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:17:19,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:17:19,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:17:19,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:17:19,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:17:19,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:17:19,593 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|████████████████████████████████████████████████████████ | 819/1110 [5:15:47<1:54:15, 23.56s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|████████████████████████████████████████████████████████ | 819/1110 [5:15:47<1:54:15, 23.56s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0905, 'learning_rate': 0.00014459016393442622, 'epoch': 7.38} 74%|████████████████████████████████████████████████████████ | 819/1110 [5:15:47<1:54:15, 23.56s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0853, 'learning_rate': 0.0001440983606557377, 'epoch': 7.39} [WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:17:37,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:18:04,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:18:04,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:18:04,178 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:18:10,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:18:10,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:18:10,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0755, 'learning_rate': 0.00014360655737704917, 'epoch': 7.39} [WARNING|modeling_utils.py:388] 2022-03-28 22:18:10,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:18:18,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:18:18,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:18:18,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:18:26,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:18:26,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:18:30,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:18:32,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:18:32,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:18:32,694 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0726, 'learning_rate': 0.00014311475409836065, 'epoch': 7.4} [WARNING|modeling_utils.py:388] 2022-03-28 22:18:38,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:18:40,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:18:42,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:18:42,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:18:46,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:18:48,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:18:50,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:18:50,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:18:52,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:18:54,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:18:56,743 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:18:58,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:00,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:02,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:04,104 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:07,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:07,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:09,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:10,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:12,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:15,150 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:16,536 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:19,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:19,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:20,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:22,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:25,160 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:27,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:29,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:29,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:31,101 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:33,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:34,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:34,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:37,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:37,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:41,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:41,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:45,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:45,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:48,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:48,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:52,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:55,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:19:55,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:01,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:01,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:01,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:04,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:04,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:08,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:12,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:12,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:15,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:15,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:19,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:22,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:22,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2098, 'learning_rate': 0.00013967213114754096, 'epoch': 7.47} [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1766, 'learning_rate': 0.00013918032786885243, 'epoch': 7.48} [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1588, 'learning_rate': 0.00013868852459016394, 'epoch': 7.48} [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.143, 'learning_rate': 0.00013819672131147538, 'epoch': 7.49} [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1298, 'learning_rate': 0.00013770491803278688, 'epoch': 7.5} [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1163, 'learning_rate': 0.00013721311475409833, 'epoch': 7.51} [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1181, 'learning_rate': 0.00013672131147540983, 'epoch': 7.52} [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:20:26,082 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.116, 'learning_rate': 0.0001362295081967213, 'epoch': 7.53} 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0969, 'learning_rate': 0.00013573770491803278, 'epoch': 7.54} 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0956, 'learning_rate': 0.00013524590163934425, 'epoch': 7.55} 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0921, 'learning_rate': 0.00013475409836065573, 'epoch': 7.56} 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1068, 'learning_rate': 0.0001342622950819672, 'epoch': 7.57} 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 75%|█████████████████████████████████████████████████████████▏ | 836/1110 [5:21:57<1:58:22, 25.92s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 841/1110 [5:24:02<1:52:37, 25.12s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 841/1110 [5:24:02<1:52:37, 25.12s/it] Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0797, 'learning_rate': 0.00013327868852459017, 'epoch': 7.58} [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:25:50,316 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:26:31,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:26:31,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0789, 'learning_rate': 0.00013278688524590162, 'epoch': 7.59} [WARNING|modeling_bart.py:1051] 2022-03-28 22:26:31,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:26:31,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:26:31,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:26:41,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:26:41,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:26:45,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:26:45,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:26:45,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:26:52,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:26:52,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:26:52,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:26:56,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:26:56,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:26:56,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:26:56,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:26:56,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:26:56,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:26:56,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:27:10,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:27:10,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:27:14,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:27:14,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:27:14,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0798, 'learning_rate': 0.00013180327868852457, 'epoch': 7.61} [WARNING|modeling_bart.py:1051] 2022-03-28 22:27:14,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:27:14,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:27:14,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:27:14,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:27:14,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:27:30,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:27:30,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:27:30,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:27:36,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:27:36,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0625, 'learning_rate': 0.00013131147540983604, 'epoch': 7.62} [WARNING|modeling_utils.py:388] 2022-03-28 22:27:36,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:27:42,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:27:42,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:27:42,856 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:27:49,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:27:49,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:27:53,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:27:55,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:27:55,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0729, 'learning_rate': 0.00013081967213114754, 'epoch': 7.63} [WARNING|modeling_utils.py:388] 2022-03-28 22:27:59,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:27:59,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:03,467 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:05,536 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:07,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:09,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:11,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:11,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 21:29:05,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|██████████████████████████████████████████████████████████ | 848/1110 [5:26:29<1:28:02, 20.16s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:28:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:15,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:17,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:19,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:21,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:22,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:24,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:24,549 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|██████████████████████████████████████████████████████████▏ | 849/1110 [5:26:43<1:20:15, 18.45s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:28:27,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:29,544 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:27,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:31,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:27,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:34,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:27,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:35,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:27,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:38,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:27,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:38,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:27,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|██████████████████████████████████████████████████████████▏ | 850/1110 [5:26:55<1:11:24, 16.48s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:28:39,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:42,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:39,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:43,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:39,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:45,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:39,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:47,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:39,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:47,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:39,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:49,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:48,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:52,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:48,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:54,247 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:48,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:28:54,247 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:48,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|███████████████████████████████████████████████████████████▊ | 852/1110 [5:27:12<52:09, 12.13s/it] Setting `use_cache=False`...1] 2022-03-28 22:28:48,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|███████████████████████████████████████████████████████████▊ | 852/1110 [5:27:12<52:09, 12.13s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:00,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:00,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:04,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:04,168 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:07,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:07,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:11,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:14,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:14,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:20,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:20,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:24,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|██████████████████████████████████████████████████████████▍ | 853/1110 [5:27:43<1:16:03, 17.76s/it] Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|██████████████████████████████████████████████████████████▍ | 853/1110 [5:27:43<1:16:03, 17.76s/it] Setting `use_cache=False`...1] 2022-03-28 22:28:56,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|██████████████████████████████████████████████████████████▍ | 853/1110 [5:27:43<1:16:03, 17.76s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:31,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:31,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:34,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:34,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:38,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:41,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:41,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:45,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:45,214 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:48,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2042, 'learning_rate': 0.00012737704918032786, 'epoch': 7.69} [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1538, 'learning_rate': 0.00012688524590163933, 'epoch': 7.7} [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1337, 'learning_rate': 0.0001263934426229508, 'epoch': 7.71} [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1333, 'learning_rate': 0.00012590163934426228, 'epoch': 7.72} [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1308, 'learning_rate': 0.00012540983606557378, 'epoch': 7.73} [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1062, 'learning_rate': 0.00012491803278688523, 'epoch': 7.74} [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:29:52,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1091, 'learning_rate': 0.0001244262295081967, 'epoch': 7.74} [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1118, 'learning_rate': 0.0001239344262295082, 'epoch': 7.75} [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:32:27,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 862/1110 [5:31:44<1:46:54, 25.86s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0905, 'learning_rate': 0.00012295081967213115, 'epoch': 7.77} 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████ | 863/1110 [5:32:09<1:45:21, 25.59s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1006, 'learning_rate': 0.0001224590163934426, 'epoch': 7.78} 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0995, 'learning_rate': 0.00012196721311475408, 'epoch': 7.79} 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0921, 'learning_rate': 0.00012147540983606557, 'epoch': 7.8} 78%|███████████████████████████████████████████████████████████▏ | 864/1110 [5:32:34<1:43:58, 25.36s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:14,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▎ | 867/1110 [5:33:48<1:40:05, 24.71s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▎ | 867/1110 [5:33:48<1:40:05, 24.71s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0971, 'learning_rate': 0.00012098360655737703, 'epoch': 7.81} 78%|███████████████████████████████████████████████████████████▎ | 867/1110 [5:33:48<1:40:05, 24.71s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▎ | 867/1110 [5:33:48<1:40:05, 24.71s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:40,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:40,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:44,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:44,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:44,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:44,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:44,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:35:44,635 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0866, 'learning_rate': 0.00011999999999999999, 'epoch': 7.83} 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▍ | 868/1110 [5:34:11<1:37:34, 24.19s/it]g-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:23,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:23,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:27,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:27,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:27,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:27,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:27,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:27,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:27,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0981, 'learning_rate': 0.00011950819672131146, 'epoch': 7.83} [WARNING|modeling_utils.py:388] 2022-03-28 22:36:42,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:42,301 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:46,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:46,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:46,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:52,509 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:52,509 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:52,509 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:58,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:58,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.072, 'learning_rate': 0.00011901639344262294, 'epoch': 7.84} [WARNING|modeling_utils.py:388] 2022-03-28 22:36:58,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:36:58,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:37:07,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:37:07,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:37:07,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:37:07,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:15,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:15,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:15,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:15,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:15,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:37:22,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:37:25,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:37:27,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:37:27,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:30,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:32,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:34,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:36,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:36,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:39,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:40,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:42,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:44,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:46,398 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:49,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:51,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:51,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:53,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:54,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:56,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:37:59,322 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:00,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:03,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:03,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:04,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:07,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:09,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:11,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:12,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:12,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:15,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:17,555 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:19,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:19,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0819, 'learning_rate': 0.00011606557377049179, 'epoch': 7.9} [WARNING|modeling_utils.py:388] 2022-03-28 22:38:23,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:23,890 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:27,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:27,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:30,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:34,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:34,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:37,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:37,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:37,980 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:43,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:43,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:47,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:47,079 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:50,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:50,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:54,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:54,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:38:57,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:00,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:00,944 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:04,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:04,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1619, 'learning_rate': 0.00011508196721311474, 'epoch': 7.91} [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1235, 'learning_rate': 0.00011459016393442623, 'epoch': 7.92} [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0996, 'learning_rate': 0.00011409836065573769, 'epoch': 7.93} [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0896, 'learning_rate': 0.00011360655737704917, 'epoch': 7.94} [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0785, 'learning_rate': 0.00011311475409836063, 'epoch': 7.95} [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0756, 'learning_rate': 0.00011262295081967212, 'epoch': 7.96} [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:39:07,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:41:48,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:41:48,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:41:48,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:41:48,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:41:48,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:41:48,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:01,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:01,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:42:05,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:42:05,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:42:05,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0629, 'learning_rate': 0.00011163934426229507, 'epoch': 7.98} [WARNING|modeling_utils.py:388] 2022-03-28 22:42:05,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:12,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:15,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:17,296 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:19,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:21,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:23,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:23,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:29:27,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|████████████████████████████████████████████████████████████▋ | 887/1110 [5:40:40<1:17:51, 20.95s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:42:24,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:26,675 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:24,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:29,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:24,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:31,050 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:24,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:33,488 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:24,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|████████████████████████████████████████████████████████████▊ | 888/1110 [5:40:51<1:06:29, 17.97s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|████████████████████████████████████████████████████████████▊ | 888/1110 [5:40:51<1:06:29, 17.97s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:37,602 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:38,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:40,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:44,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:44,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:47,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:47,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:51,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:51,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:54,823 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:54,823 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:42:58,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1714, 'learning_rate': 0.00011016393442622949, 'epoch': 8.01} [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1262, 'learning_rate': 0.00010967213114754098, 'epoch': 8.02} [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.089, 'learning_rate': 0.00010918032786885245, 'epoch': 8.03} [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:43:01,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1003, 'learning_rate': 0.00010868852459016392, 'epoch': 8.04} Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0913, 'learning_rate': 0.0001081967213114754, 'epoch': 8.04} Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0799, 'learning_rate': 0.00010770491803278687, 'epoch': 8.05} 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0765, 'learning_rate': 0.00010721311475409835, 'epoch': 8.06} 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0802, 'learning_rate': 0.00010672131147540983, 'epoch': 8.07} 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0641, 'learning_rate': 0.00010622950819672129, 'epoch': 8.08} 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▏ | 894/1110 [5:43:42<1:34:28, 26.24s/it] Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0614, 'learning_rate': 0.00010573770491803278, 'epoch': 8.09} Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.064, 'learning_rate': 0.00010524590163934425, 'epoch': 8.1} Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0568, 'learning_rate': 0.00010475409836065573, 'epoch': 8.11} Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0609, 'learning_rate': 0.0001042622950819672, 'epoch': 8.12} [WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:48:07,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0569, 'learning_rate': 0.00010377049180327867, 'epoch': 8.13} [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:48:32,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0584, 'learning_rate': 0.00010327868852459015, 'epoch': 8.13} [WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:49:09,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0536, 'learning_rate': 0.00010278688524590164, 'epoch': 8.14} 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▉ | 904/1110 [5:47:52<1:23:13, 24.24s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|█████████████████████████████████████████████████████████████▉ | 905/1110 [5:48:14<1:20:46, 23.64s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|█████████████████████████████████████████████████████████████▉ | 905/1110 [5:48:14<1:20:46, 23.64s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0542, 'learning_rate': 0.0001022950819672131, 'epoch': 8.15} 82%|█████████████████████████████████████████████████████████████▉ | 905/1110 [5:48:14<1:20:46, 23.64s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|█████████████████████████████████████████████████████████████▉ | 905/1110 [5:48:14<1:20:46, 23.64s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|█████████████████████████████████████████████████████████████▉ | 905/1110 [5:48:14<1:20:46, 23.64s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|█████████████████████████████████████████████████████████████▉ | 905/1110 [5:48:14<1:20:46, 23.64s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|█████████████████████████████████████████████████████████████▉ | 905/1110 [5:48:14<1:20:46, 23.64s/it] Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:50:12,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:50:12,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:50:12,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:50:12,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:50:12,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0597, 'learning_rate': 0.00010180327868852458, 'epoch': 8.16} [WARNING|modeling_utils.py:388] 2022-03-28 22:50:12,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:50:12,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:50:12,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:50:28,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:50:28,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:50:28,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:50:28,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:50:36,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:50:36,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:50:36,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0608, 'learning_rate': 0.00010131147540983606, 'epoch': 8.17} [WARNING|modeling_utils.py:388] 2022-03-28 22:50:43,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:50:43,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:50:43,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:50:43,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:50:51,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 22:50:51,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:50:55,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:50:55,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:42:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|██████████████████████████████████████████████████████████████▏ | 908/1110 [5:49:15<1:11:40, 21.29s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 82%|██████████████████████████████████████████████████████████████▏ | 908/1110 [5:49:15<1:11:40, 21.29s/it][WARNING|modeling_bart.py:1051] 2022-03-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0479, 'learning_rate': 0.00010081967213114753, 'epoch': 8.18} [WARNING|modeling_utils.py:388] 2022-03-28 22:51:03,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:05,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:05,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:05,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:11,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:13,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:15,499 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:17,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:17,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:19,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:21,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:23,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:25,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:26,906 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:28,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:32,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:32,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:33,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:35,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:36,857 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:39,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:41,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:43,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:43,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:45,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:47,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:49,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:51,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:53,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:53,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:56,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:57,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:59,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:51:59,460 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:00,233 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:04,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:04,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:07,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:07,791 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:11,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:11,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:14,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:14,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:18,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:22,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:22,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:25,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:25,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:29,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:29,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:32,667 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:32,667 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:36,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:36,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:39,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:43,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:43,084 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:46,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:46,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:51,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1007, 'learning_rate': 9.737704918032786e-05, 'epoch': 8.24} [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0901, 'learning_rate': 9.688524590163933e-05, 'epoch': 8.25} [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0894, 'learning_rate': 9.639344262295081e-05, 'epoch': 8.26} [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0852, 'learning_rate': 9.59016393442623e-05, 'epoch': 8.27} [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0744, 'learning_rate': 9.540983606557375e-05, 'epoch': 8.28} [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.067, 'learning_rate': 9.491803278688524e-05, 'epoch': 8.29} [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0717, 'learning_rate': 9.442622950819672e-05, 'epoch': 8.3} [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0604, 'learning_rate': 9.393442622950819e-05, 'epoch': 8.3} [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0617, 'learning_rate': 9.344262295081966e-05, 'epoch': 8.31} [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0527, 'learning_rate': 9.295081967213114e-05, 'epoch': 8.32} [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.057, 'learning_rate': 9.245901639344261e-05, 'epoch': 8.33} [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0499, 'learning_rate': 9.19672131147541e-05, 'epoch': 8.34} [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:52:55,382 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|███████████████████████████████████████████████████████████████▍ | 927/1110 [5:56:24<1:14:54, 24.56s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0546, 'learning_rate': 9.098360655737704e-05, 'epoch': 8.36} [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0565, 'learning_rate': 9.049180327868852e-05, 'epoch': 8.37} [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:58:11,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:08,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:08,823 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:12,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:12,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:12,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:12,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:12,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0507, 'learning_rate': 8.999999999999999e-05, 'epoch': 8.38} [WARNING|modeling_utils.py:388] 2022-03-28 22:59:12,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:12,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:27,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:27,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:27,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:27,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:35,325 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:35,325 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:39,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:39,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0496, 'learning_rate': 8.950819672131147e-05, 'epoch': 8.39} [WARNING|modeling_utils.py:388] 2022-03-28 22:59:39,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:39,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:39,307 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:49,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:49,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:49,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:55,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 22:59:55,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|███████████████████████████████████████████████████████████████▊ | 932/1110 [5:58:16<1:05:11, 21.98s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|███████████████████████████████████████████████████████████████▊ | 932/1110 [5:58:16<1:05:11, 21.98s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:00:01,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:00:01,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:06,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:06,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:00:10,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:00:12,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:00:12,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:16,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:16,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 84%|███████████████████████████████████████████████████████████████▉ | 933/1110 [5:58:34<1:02:02, 21.03s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:00:20,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:00:20,622 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:24,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:26,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:26,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:00:30,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:00:32,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:00:32,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:36,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:36,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:38,692 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:40,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:42,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:44,244 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:46,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:47,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:51,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:51,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:52,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:54,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:57,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:00:58,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:01,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:02,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:02,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:05,469 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:06,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:08,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:11,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:11,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:13,145 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:15,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:17,579 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:19,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:19,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0566, 'learning_rate': 8.60655737704918e-05, 'epoch': 8.45} [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:22,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:22,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:26,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:29,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:29,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:33,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:33,409 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:37,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:37,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:40,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:44,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:44,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:47,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:47,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1376, 'learning_rate': 8.557377049180327e-05, 'epoch': 8.46} [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:51,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:51,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:54,896 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:54,896 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:01:58,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:01,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:01,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:05,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:05,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1047, 'learning_rate': 8.508196721311476e-05, 'epoch': 8.47} [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0878, 'learning_rate': 8.459016393442622e-05, 'epoch': 8.48} [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0905, 'learning_rate': 8.40983606557377e-05, 'epoch': 8.48} [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:02:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0747, 'learning_rate': 8.360655737704916e-05, 'epoch': 8.49} 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0621, 'learning_rate': 8.311475409836065e-05, 'epoch': 8.5} 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.075, 'learning_rate': 8.262295081967212e-05, 'epoch': 8.51} 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0683, 'learning_rate': 8.21311475409836e-05, 'epoch': 8.52} 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0557, 'learning_rate': 8.163934426229507e-05, 'epoch': 8.53} 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 943/1110 [6:01:57<1:09:55, 25.12s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0588, 'learning_rate': 8.114754098360656e-05, 'epoch': 8.54} 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▉ | 948/1110 [6:04:08<1:09:46, 25.84s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0597, 'learning_rate': 8.065573770491802e-05, 'epoch': 8.55} Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0598, 'learning_rate': 8.01639344262295e-05, 'epoch': 8.56} Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0542, 'learning_rate': 7.967213114754097e-05, 'epoch': 8.57} Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.051, 'learning_rate': 7.918032786885245e-05, 'epoch': 8.57} [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0587, 'learning_rate': 7.868852459016393e-05, 'epoch': 8.58} [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0526, 'learning_rate': 7.81967213114754e-05, 'epoch': 8.59} [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0481, 'learning_rate': 7.770491803278687e-05, 'epoch': 8.6} [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:07:28,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████████▏ | 956/1110 [6:07:16<58:33, 22.82s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████████▏ | 956/1110 [6:07:16<58:33, 22.82s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0511, 'learning_rate': 7.721311475409836e-05, 'epoch': 8.61} [WARNING|modeling_bart.py:1051] 2022-03-28 23:09:04,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:09:04,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:09:04,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:09:04,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:12,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:12,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:09:17,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:09:17,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:09:17,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:09:17,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.038, 'learning_rate': 7.672131147540982e-05, 'epoch': 8.62} [WARNING|modeling_utils.py:388] 2022-03-28 23:09:24,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:24,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:28,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:30,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:30,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:09:34,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:09:37,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████████▎ | 958/1110 [6:07:55<53:13, 21.01s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|███████████████████████████████████████████████████████████████████▎ | 958/1110 [6:07:55<53:13, 21.01s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:41,023 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:43,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:45,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:45,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:45,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:51,088 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:53,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:55,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:57,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:57,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:09:59,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:00,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:02,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:04,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:06,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:08,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:11,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:11,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:13,196 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:14,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:17,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:19,180 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:21,927 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:23,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:23,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:25,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:27,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:29,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:31,315 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:33,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:33,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:35,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:37,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:39,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:39,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0697, 'learning_rate': 7.377049180327868e-05, 'epoch': 8.67} [WARNING|modeling_utils.py:388] 2022-03-28 23:10:43,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:43,174 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:46,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:46,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:50,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:50,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:53,949 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:57,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:10:57,523 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:01,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:01,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:04,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:08,197 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:08,197 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1125, 'learning_rate': 7.327868852459015e-05, 'epoch': 8.68} [WARNING|modeling_utils.py:388] 2022-03-28 23:11:11,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:11,773 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:15,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:15,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:18,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:22,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:22,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:25,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:25,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:31,228 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0963, 'learning_rate': 7.278688524590164e-05, 'epoch': 8.69} [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0785, 'learning_rate': 7.229508196721311e-05, 'epoch': 8.7} [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0818, 'learning_rate': 7.180327868852459e-05, 'epoch': 8.71} [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0798, 'learning_rate': 7.131147540983606e-05, 'epoch': 8.72} [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.064, 'learning_rate': 7.081967213114753e-05, 'epoch': 8.73} [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0586, 'learning_rate': 7.032786885245901e-05, 'epoch': 8.74} [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0594, 'learning_rate': 6.983606557377048e-05, 'epoch': 8.74} [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0661, 'learning_rate': 6.934426229508197e-05, 'epoch': 8.75} [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:11:34,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0698, 'learning_rate': 6.885245901639344e-05, 'epoch': 8.76} g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0558, 'learning_rate': 6.836065573770492e-05, 'epoch': 8.77} g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0524, 'learning_rate': 6.786885245901639e-05, 'epoch': 8.78} g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0562, 'learning_rate': 6.737704918032786e-05, 'epoch': 8.79} 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0507, 'learning_rate': 6.688524590163934e-05, 'epoch': 8.8} 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 976/1110 [6:14:40<55:33, 24.87s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0477, 'learning_rate': 6.639344262295081e-05, 'epoch': 8.81} [WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:11,118 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0525, 'learning_rate': 6.590163934426228e-05, 'epoch': 8.82} [WARNING|modeling_utils.py:388] 2022-03-28 23:17:39,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:39,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:43,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:43,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:43,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:43,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:43,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:43,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:56,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:56,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:17:56,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:00,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:00,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:00,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:00,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:00,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:00,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:00,296 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:14,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:14,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▉ | 981/1110 [6:16:35<49:05, 22.83s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▉ | 981/1110 [6:16:35<49:05, 22.83s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0446, 'learning_rate': 6.491803278688524e-05, 'epoch': 8.83} 88%|████████████████████████████████████████████████████████████████████▉ | 981/1110 [6:16:35<49:05, 22.83s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▉ | 981/1110 [6:16:35<49:05, 22.83s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▉ | 981/1110 [6:16:35<49:05, 22.83s/it]g-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:29,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:29,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:29,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:29,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:18:37,536 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████████ | 982/1110 [6:16:55<47:14, 22.14s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|█████████████████████████████████████████████████████████████████████ | 982/1110 [6:16:55<47:14, 22.14s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0443, 'learning_rate': 6.442622950819672e-05, 'epoch': 8.84} 88%|█████████████████████████████████████████████████████████████████████ | 982/1110 [6:16:55<47:14, 22.14s/it] Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:45,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:45,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:45,468 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:51,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:53,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:18:53,816 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:18:58,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:18:58,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0344, 'learning_rate': 6.393442622950819e-05, 'epoch': 8.85} [WARNING|modeling_utils.py:388] 2022-03-28 23:19:01,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:19:04,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:19:04,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:19:04,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:19:04,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:19:12,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:19:12,201 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:15,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:15,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 22:50:59,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████████▏ | 984/1110 [6:17:33<43:05, 20.52s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:19:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:19,991 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:21,905 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:23,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:25,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:27,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:30,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:18,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████████▏ | 985/1110 [6:17:48<39:06, 18.77s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:19:32,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████████▏ | 985/1110 [6:17:48<39:06, 18.77s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:19:32,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:34,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:32,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:35,742 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:32,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:37,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:32,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:40,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:32,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:41,553 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:32,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████████▎ | 986/1110 [6:18:00<34:31, 16.71s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:19:44,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████████▎ | 986/1110 [6:18:00<34:31, 16.71s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:19:44,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:45,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:44,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:48,090 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:44,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:50,351 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:44,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:52,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:44,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:52,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:44,317 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:54,477 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:53,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:56,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:53,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:19:58,835 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:19:53,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████████▍ | 988/1110 [6:18:16<24:52, 12.24s/it] Setting `use_cache=False`...1] 2022-03-28 23:19:53,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████████▍ | 988/1110 [6:18:16<24:52, 12.24s/it] Setting `use_cache=False`...1] 2022-03-28 23:19:53,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████████▍ | 988/1110 [6:18:16<24:52, 12.24s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████████▍ | 988/1110 [6:18:16<24:52, 12.24s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:05,213 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:08,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:08,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:12,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:12,248 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:15,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:19,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:19,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:22,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:22,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:26,203 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████████▍ | 989/1110 [6:18:45<34:28, 17.10s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████████▍ | 989/1110 [6:18:45<34:28, 17.10s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:01,475 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 89%|█████████████████████████████████████████████████████████████████████▍ | 989/1110 [6:18:45<34:28, 17.10s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:33,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:33,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:36,670 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:36,670 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:40,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:43,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:43,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:46,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:46,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0814, 'learning_rate': 6.0491803278688514e-05, 'epoch': 8.91} [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0625, 'learning_rate': 5.9999999999999995e-05, 'epoch': 8.92} [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0625, 'learning_rate': 5.950819672131147e-05, 'epoch': 8.93} [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0568, 'learning_rate': 5.901639344262294e-05, 'epoch': 8.94} [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0485, 'learning_rate': 5.8524590163934416e-05, 'epoch': 8.95} [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:20:52,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0434, 'learning_rate': 5.8032786885245896e-05, 'epoch': 8.96} 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|█████████████████████████████████████████████████████████████████████▉ | 995/1110 [6:21:19<45:52, 23.94s/it] Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:23:23,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:23:23,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:23:23,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:23:23,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0434, 'learning_rate': 5.754098360655737e-05, 'epoch': 8.97} [WARNING|modeling_bart.py:1051] 2022-03-28 23:23:23,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:23:23,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:23:23,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:23:37,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:23:37,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:23:37,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:23:37,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:23:46,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████████ | 997/1110 [6:22:04<43:13, 22.95s/it] Setting `use_cache=False`...e computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████████ | 997/1110 [6:22:04<43:13, 22.95s/it] Setting `use_cache=False`...e computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:23:50,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:23:50,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:23:50,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:23:50,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:23:57,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:23:59,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:01,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:03,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:20:29,817 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████████▏ | 998/1110 [6:22:21<39:34, 21.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:24:05,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████████▏ | 998/1110 [6:22:21<39:34, 21.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:24:05,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:07,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:05,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:10,379 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:05,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:11,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:05,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:14,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:05,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████████▏ | 999/1110 [6:22:32<33:49, 18.28s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:05,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|██████████████████████████████████████████████████████████████████████▏ | 999/1110 [6:22:32<33:49, 18.28s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:05,503 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:18,562 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:19,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:21,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:21,260 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:25,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:25,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:28,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:32,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:32,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:35,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:35,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:39,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:43,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:43,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:43,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:43,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:24:43,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/28/2022 23:30:15 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow {'eval_loss': 0.35239124298095703, 'eval_wer': 0.10420468068226894, 'eval_runtime': 326.5742, 'eval_samples_per_second': 8.09, 'eval_steps_per_second': 0.508, 'epoch': 9.01} [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-28 23:24:49,135 >> Num examples = 2642rue` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0658, 'learning_rate': 5.5081967213114745e-05, 'epoch': 9.02} Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0527, 'learning_rate': 5.4590163934426226e-05, 'epoch': 9.03} Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0639, 'learning_rate': 5.40983606557377e-05, 'epoch': 9.04} 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0622, 'learning_rate': 5.360655737704917e-05, 'epoch': 9.04} 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0566, 'learning_rate': 5.3114754098360647e-05, 'epoch': 9.05} 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1003/1110 [6:31:35<2:38:45, 89.02s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▉ | 1006/1110 [6:32:57<1:23:58, 48.45s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▉ | 1006/1110 [6:32:57<1:23:58, 48.45s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0423, 'learning_rate': 5.262295081967213e-05, 'epoch': 9.06} 91%|███████████████████████████████████████████████████████████████████▉ | 1006/1110 [6:32:57<1:23:58, 48.45s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|███████████████████████████████████████████████████████████████████▉ | 1006/1110 [6:32:57<1:23:58, 48.45s/it] Setting `use_cache=False`...1] 2022-03-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0496, 'learning_rate': 5.21311475409836e-05, 'epoch': 9.07} [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0438, 'learning_rate': 5.1639344262295074e-05, 'epoch': 9.08} [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.04, 'learning_rate': 5.114754098360655e-05, 'epoch': 9.09} [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:34:49,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0399, 'learning_rate': 5.065573770491803e-05, 'epoch': 9.1} [WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:23,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0494, 'learning_rate': 5.01639344262295e-05, 'epoch': 9.11} [WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:36:42,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0367, 'learning_rate': 4.9672131147540976e-05, 'epoch': 9.12} 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0387, 'learning_rate': 4.918032786885245e-05, 'epoch': 9.13} 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▏ | 1012/1110 [6:35:31<45:25, 27.81s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:37:51,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:37:51,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:37:51,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:37:51,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:37:51,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0372, 'learning_rate': 4.868852459016393e-05, 'epoch': 9.13} [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0334, 'learning_rate': 4.8196721311475404e-05, 'epoch': 9.14} [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:02,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0286, 'learning_rate': 4.770491803278688e-05, 'epoch': 9.15} [WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:38:37,030 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:05,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:05,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:05,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:05,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:05,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0352, 'learning_rate': 4.721311475409836e-05, 'epoch': 9.16} [WARNING|modeling_utils.py:388] 2022-03-28 23:39:05,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:05,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:20,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:20,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:20,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:20,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:20,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:30,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:30,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:30,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0299, 'learning_rate': 4.672131147540983e-05, 'epoch': 9.17} [WARNING|modeling_bart.py:1051] 2022-03-28 23:39:36,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:39:36,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:40,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:40,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:39:44,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:39:44,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:48,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:48,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:50,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:50,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:39:55,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:39:55,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:39:58,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:40:00,877 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:40:03,007 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:40:05,062 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:40:07,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:40:07,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:40:09,213 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:40:11,166 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:40:13,078 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:40:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:40:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:40:14,915 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:20,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:22,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:22,441 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:24,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:25,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:29,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:30,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:32,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:34,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:34,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:36,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:39,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:41,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:42,632 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:44,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:44,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:46,913 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:49,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:51,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:51,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:52,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:52,954 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:56,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:40:56,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:00,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:03,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:03,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:07,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:07,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:10,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:10,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:14,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:17,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:17,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:21,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:21,489 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0687, 'learning_rate': 4.327868852459016e-05, 'epoch': 9.23} [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:25,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:28,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:28,688 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:32,178 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:32,178 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:35,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:35,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:39,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.071, 'learning_rate': 4.2786885245901634e-05, 'epoch': 9.24} [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0587, 'learning_rate': 4.229508196721311e-05, 'epoch': 9.25} [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:41:42,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0558, 'learning_rate': 4.180327868852458e-05, 'epoch': 9.26} [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0559, 'learning_rate': 4.131147540983606e-05, 'epoch': 9.27} [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0525, 'learning_rate': 4.0819672131147536e-05, 'epoch': 9.28} [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0428, 'learning_rate': 4.032786885245901e-05, 'epoch': 9.29} [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.05, 'learning_rate': 3.983606557377048e-05, 'epoch': 9.3} [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.043, 'learning_rate': 3.9344262295081964e-05, 'epoch': 9.3} [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0455, 'learning_rate': 3.885245901639344e-05, 'epoch': 9.31} [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:42:40,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1035/1110 [6:44:06<32:02, 25.64s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1035/1110 [6:44:06<32:02, 25.64s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0381, 'learning_rate': 3.836065573770491e-05, 'epoch': 9.32} 93%|███████████████████████████████████████████████████████████████████████▊ | 1035/1110 [6:44:06<32:02, 25.64s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1035/1110 [6:44:06<32:02, 25.64s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1035/1110 [6:44:06<32:02, 25.64s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1035/1110 [6:44:06<32:02, 25.64s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1035/1110 [6:44:06<32:02, 25.64s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1035/1110 [6:44:06<32:02, 25.64s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:46:06,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:46:06,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:46:06,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:46:06,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0315, 'learning_rate': 3.786885245901639e-05, 'epoch': 9.33} 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1036/1110 [6:44:31<31:15, 25.35s/it]g-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0366, 'learning_rate': 3.7377049180327865e-05, 'epoch': 9.34} [WARNING|modeling_utils.py:388] 2022-03-28 23:46:41,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:46:41,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:46:41,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:46:41,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:46:41,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:46:52,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:46:52,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:46:52,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:46:52,206 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0286, 'learning_rate': 3.688524590163934e-05, 'epoch': 9.35} [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0401, 'learning_rate': 3.639344262295082e-05, 'epoch': 9.36} [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:47:00,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:47:47,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:47:47,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:47:47,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:47:47,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0402, 'learning_rate': 3.590163934426229e-05, 'epoch': 9.37} [WARNING|modeling_utils.py:388] 2022-03-28 23:47:47,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:47:47,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:47:47,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:47:47,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:48:04,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:48:04,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:48:08,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:48:08,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:48:08,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:48:08,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0349, 'learning_rate': 3.540983606557377e-05, 'epoch': 9.38} [WARNING|modeling_bart.py:1051] 2022-03-28 23:48:08,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:17,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:17,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:17,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:17,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:17,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:17,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:17,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:32,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:32,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:32,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:36,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:36,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:36,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:42,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:42,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:42,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:48:42,839 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:48:51,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:48:51,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:48:51,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0276, 'learning_rate': 3.442622950819672e-05, 'epoch': 9.39} [WARNING|modeling_bart.py:1051] 2022-03-28 23:48:57,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:48:57,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:48:57,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:03,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:03,311 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:49:07,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:49:07,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:11,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1044/1110 [6:47:29<23:19, 21.21s/it] Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1044/1110 [6:47:29<23:19, 21.21s/it] Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:49:15,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:49:15,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:19,257 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:21,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:23,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:25,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:27,552 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:29,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:29,530 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:31,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:33,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:35,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:37,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:37,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:41,626 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:43,358 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:45,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:45,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:24:16,726 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▌ | 1046/1110 [6:48:02<20:04, 18.82s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:49:46,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:50,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:46,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:51,637 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:46,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:53,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:46,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:55,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:46,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:57,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:46,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:49:57,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:46,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:00,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:58,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:01,286 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:58,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:03,618 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:58,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:05,783 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:49:58,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▋ | 1048/1110 [6:48:24<15:01, 14.54s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:50:08,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▋ | 1048/1110 [6:48:24<15:01, 14.54s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:50:08,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:09,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:08,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:12,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:08,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:14,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:08,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:14,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:08,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▊ | 1049/1110 [6:48:31<12:30, 12.30s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▊ | 1049/1110 [6:48:31<12:30, 12.30s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:19,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:19,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:23,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:26,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:26,916 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:30,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:30,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:34,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:34,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:37,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:41,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:41,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:41,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:15,970 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▊ | 1050/1110 [6:49:00<17:18, 17.31s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 95%|████████████████████████████████████████████████████████████████████████▊ | 1050/1110 [6:49:00<17:18, 17.31s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:48,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:48,400 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:51,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:55,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:55,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:58,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:50:58,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:02,408 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0621, 'learning_rate': 3.049180327868852e-05, 'epoch': 9.47} [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.054, 'learning_rate': 2.9999999999999997e-05, 'epoch': 9.48} [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0524, 'learning_rate': 2.950819672131147e-05, 'epoch': 9.48} [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0471, 'learning_rate': 2.9016393442622948e-05, 'epoch': 9.49} [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.048, 'learning_rate': 2.8524590163934422e-05, 'epoch': 9.5} [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0498, 'learning_rate': 2.80327868852459e-05, 'epoch': 9.51} [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0381, 'learning_rate': 2.7540983606557373e-05, 'epoch': 9.52} [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0503, 'learning_rate': 2.704918032786885e-05, 'epoch': 9.53} [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0317, 'learning_rate': 2.6557377049180323e-05, 'epoch': 9.54} [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0365, 'learning_rate': 2.60655737704918e-05, 'epoch': 9.55} [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0385, 'learning_rate': 2.5573770491803274e-05, 'epoch': 9.56} [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:51:05,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0298, 'learning_rate': 2.508196721311475e-05, 'epoch': 9.57} 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▋ | 1062/1110 [6:54:17<20:02, 25.06s/it] Setting `use_cache=False`...1] 2022-03-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:27,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1064/1110 [6:55:04<18:37, 24.29s/it]g-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1064/1110 [6:55:04<18:37, 24.29s/it]g-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0314, 'learning_rate': 2.4098360655737702e-05, 'epoch': 9.58} 96%|█████████████████████████████████████████████████████████████████████████▊ | 1064/1110 [6:55:04<18:37, 24.29s/it]g-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1064/1110 [6:55:04<18:37, 24.29s/it]g-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1064/1110 [6:55:04<18:37, 24.29s/it]g-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0316, 'learning_rate': 2.360655737704918e-05, 'epoch': 9.59} [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:56:58,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:57:37,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:57:37,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:57:37,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0386, 'learning_rate': 2.262295081967213e-05, 'epoch': 9.61} [WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:57:42,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:58:05,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:58:05,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:58:05,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:58:05,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:58:13,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:58:13,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:58:13,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0289, 'learning_rate': 2.2131147540983603e-05, 'epoch': 9.62} [WARNING|modeling_bart.py:1051] 2022-03-28 23:58:13,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:58:21,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:58:23,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:58:23,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:58:23,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:58:29,568 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:58:31,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:58:31,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:58:31,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:50:44,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|██████████████████████████████████████████████████████████████████████████▏ | 1069/1110 [6:56:51<14:30, 21.23s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|██████████████████████████████████████████████████████████████████████████▏ | 1069/1110 [6:56:51<14:30, 21.23s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:58:39,914 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:58:42,073 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:58:44,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 23:58:44,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:58:48,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:58:50,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:58:52,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:58:52,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:58:54,201 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:58:56,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:58:58,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:58:59,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:58:59,859 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:04,205 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:07,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:07,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-28 23:58:36,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|██████████████████████████████████████████████████████████████████████████▎ | 1071/1110 [6:57:25<12:18, 18.93s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:59:09,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:12,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:09,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:14,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:09,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:15,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:09,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:18,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:09,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:20,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:09,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:20,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:09,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:22,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:21,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:24,190 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:21,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:26,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:21,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:28,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:21,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████▍ | 1073/1110 [6:57:47<09:04, 14.72s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:59:31,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████▍ | 1073/1110 [6:57:47<09:04, 14.72s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:59:31,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:33,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:31,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:34,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:31,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:36,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:31,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:36,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:31,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████▌ | 1074/1110 [6:57:54<07:28, 12.46s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████▌ | 1074/1110 [6:57:54<07:28, 12.46s/it][WARNING|modeling_bart.py:1051] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:43,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:43,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:46,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:46,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:50,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:50,265 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:53,811 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:57,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-28 23:59:57,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:00,868 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:00,868 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:04,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████▌ | 1075/1110 [6:58:23<10:09, 17.40s/it] Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████▌ | 1075/1110 [6:58:23<10:09, 17.40s/it] Setting `use_cache=False`...1] 2022-03-28 23:59:39,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████▌ | 1075/1110 [6:58:23<10:09, 17.40s/it][WARNING|modeling_bart.py:1051] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:11,579 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:11,579 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:15,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:15,064 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:18,536 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:21,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:21,987 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0574, 'learning_rate': 1.819672131147541e-05, 'epoch': 9.69} [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0483, 'learning_rate': 1.7704918032786883e-05, 'epoch': 9.7} [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:00:25,407 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0521, 'learning_rate': 1.6721311475409834e-05, 'epoch': 9.72} Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0498, 'learning_rate': 1.622950819672131e-05, 'epoch': 9.73} Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0438, 'learning_rate': 1.5737704918032785e-05, 'epoch': 9.74} Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0459, 'learning_rate': 1.524590163934426e-05, 'epoch': 9.74} Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0403, 'learning_rate': 1.4754098360655736e-05, 'epoch': 9.75} Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0334, 'learning_rate': 1.4262295081967211e-05, 'epoch': 9.76} Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0457, 'learning_rate': 1.3770491803278686e-05, 'epoch': 9.77} 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0393, 'learning_rate': 1.3278688524590162e-05, 'epoch': 9.78} 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0364, 'learning_rate': 1.2786885245901637e-05, 'epoch': 9.79} 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0329, 'learning_rate': 1.2295081967213112e-05, 'epoch': 9.8} 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.042, 'learning_rate': 1.180327868852459e-05, 'epoch': 9.81} 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0375, 'learning_rate': 1.1311475409836065e-05, 'epoch': 9.82} 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▎ | 1085/1110 [7:02:50<10:40, 25.63s/it] Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:06:46,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:06:46,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:06:50,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:06:50,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:06:50,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:06:50,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:06:50,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0371, 'learning_rate': 1.081967213114754e-05, 'epoch': 9.83} [WARNING|modeling_utils.py:388] 2022-03-29 00:07:00,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:00,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:04,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:04,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:04,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:04,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:04,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:04,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:07:16,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:07:16,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0285, 'learning_rate': 1.0327868852459016e-05, 'epoch': 9.83} [WARNING|modeling_bart.py:1051] 2022-03-29 00:07:16,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:07:16,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:07:16,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:07:16,550 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:28,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:28,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:28,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:34,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:34,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:34,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0278, 'learning_rate': 9.836065573770491e-06, 'epoch': 9.84} [WARNING|modeling_utils.py:388] 2022-03-29 00:07:40,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:40,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:07:45,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:07:45,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:49,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:49,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:07:53,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:07:53,420 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:57,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:57,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:07:59,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:08:01,637 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:08:03,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:08:05,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:08:07,798 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:08:09,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:08:11,691 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:08:13,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:08:13,705 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:08:15,554 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:08:17,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:08:19,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:08:19,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:23,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:25,476 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:27,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:00:08,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████ | 1096/1110 [7:06:44<04:19, 18.50s/it][WARNING|modeling_bart.py:1051] 2022-03-29 00:08:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████ | 1096/1110 [7:06:44<04:19, 18.50s/it][WARNING|modeling_bart.py:1051] 2022-03-29 00:08:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:30,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:33,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:34,973 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:37,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:39,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:39,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:28,949 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:41,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:40,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:43,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:40,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:46,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:40,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:48,171 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:40,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:50,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:49,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:50,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:49,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:52,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:49,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:08:53,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:49,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▏| 1099/1110 [7:07:12<02:12, 12.03s/it] Setting `use_cache=False`...1] 2022-03-29 00:08:49,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▏| 1099/1110 [7:07:12<02:12, 12.03s/it] Setting `use_cache=False`...1] 2022-03-29 00:08:49,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▏| 1099/1110 [7:07:12<02:12, 12.03s/it][WARNING|modeling_bart.py:1051] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▏| 1099/1110 [7:07:12<02:12, 12.03s/it][WARNING|modeling_bart.py:1051] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:00,903 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:04,437 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:04,437 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:07,998 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:07,998 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:11,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:11,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:15,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:18,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:18,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:21,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:21,940 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▎| 1100/1110 [7:07:40<02:49, 16.94s/it] Setting `use_cache=False`...1] 2022-03-29 00:08:57,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▎| 1100/1110 [7:07:40<02:49, 16.94s/it][WARNING|modeling_bart.py:1051] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:28,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:28,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:32,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:32,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:35,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:39,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:39,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:42,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:42,612 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:45,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0564, 'learning_rate': 5.901639344262295e-06, 'epoch': 9.91} [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0421, 'learning_rate': 5.40983606557377e-06, 'epoch': 9.92} [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0421, 'learning_rate': 4.9180327868852455e-06, 'epoch': 9.93} [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:09:49,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0357, 'learning_rate': 4.426229508196721e-06, 'epoch': 9.94} 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1104/1110 [7:09:27<02:24, 24.06s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0404, 'learning_rate': 3.934426229508196e-06, 'epoch': 9.95} 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.031, 'learning_rate': 3.4426229508196716e-06, 'epoch': 9.96} 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0362, 'learning_rate': 2.9508196721311474e-06, 'epoch': 9.97} 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1105/1110 [7:09:52<02:00, 24.19s/it] Setting `use_cache=False`...1] 2022-03-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:12:34,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:12:34,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:12:34,971 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:12:41,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:12:41,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:12:41,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0295, 'learning_rate': 2.4590163934426227e-06, 'epoch': 9.98} [WARNING|modeling_utils.py:388] 2022-03-29 00:12:41,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:12:49,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:12:51,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:12:53,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-29 00:12:53,659 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:12:57,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:12:59,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:13:01,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:13:01,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:13:03,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:13:04,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:13:07,640 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:13:08,966 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:13:11,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-29 00:13:11,326 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0308, 'learning_rate': 1.4754098360655737e-06, 'epoch': 10.0} [INFO|trainer.py:2114] 2022-03-29 00:13:12,718 >> Saving model checkpoint to ./=)compatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2114] 2022-03-29 00:13:24,702 >> Saving model checkpoint to ./=)compatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2114] 2022-03-29 00:13:24,702 >> Saving model checkpoint to ./=)compatible with gradient checkpointing. Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 0%| | 32.0k/2.19G [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 1%|▍ | 19.7M/2.19G [00:02<03:20, 11.6MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 2%|█ | 53.6M/2.19G [00:04<02:26, 15.7MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/29/2022 00:16:52 - WARNING - huggingface_hub.repository - To https://huggingface.co/sanchit-gandhi/wav2vec2-2-bart-large-cnn c0186b8..967ac64 main -> main Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Upload file pytorch_model.bin: 4%|█▊ | 86.4M/2.19G [00:06<02:16, 16.5MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/29/2022 00:17:17 - WARNING - huggingface_hub.repository - To https://huggingface.co/sanchit-gandhi/wav2vec2-2-bart-large-cnn Upload file wandb/run-20220328_170142-by95ehra/run-by95ehra.wandb: 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... ***** train metrics ***** epoch = 10.0 train_loss = 2.1242 train_runtime = 7:11:31.00 train_samples = 28538 train_samples_per_second = 11.022 train_steps_per_second = 0.043 03/29/2022 00:17:19 - INFO - __main__ - *** Evaluate *** [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/29/2022 00:24:08 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow ***** eval metrics ***** epoch = 10.0 eval_loss = 0.3543 eval_runtime = 0:06:48.71 eval_samples = 2642 eval_samples_per_second = 6.464 eval_steps_per_second = 0.406 eval_wer = 0.1002 [INFO|trainer.py:2369] 2022-03-29 00:17:19,971 >> Batch size = 8 100%|█████████████| 216M/216M [00:16<00:00, 14.1MB/s] Setting `use_cache=False`...e computed-29 00:09:25,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...