0%| | 0/2230 [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 16:59:38,446 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 16:59:39,671 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 16:59:40,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 16:59:41,567 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 16:59:42,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 16:59:43,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 16:59:44,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 16:59:45,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 16:59:45,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 16:59:47,142 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 16:59:47,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 16:59:48,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 16:59:49,640 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 16:59:50,844 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 16:59:51,509 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 16:59:52,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 16:59:53,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 16:59:54,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 16:59:55,274 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 16:59:56,459 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 16:59:57,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 16:59:58,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 16:59:58,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:00:00,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:00:00,757 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:00:01,927 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:00:02,553 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:00:03,722 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:00:04,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:00:05,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:00:06,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%| | 1/2230 [00:30<18:42:06, 30.20s/it] 0%| | 1/2230 [00:30<18:42:06, 30.20s/it][WARNING|modeling_bart.py:1051] 2022-03-28 17:00:07,403 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:00:08,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:00:09,176 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:00:09,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:00:10,962 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:00:11,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:00:12,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:00:13,401 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:00:14,595 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:00:15,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:00:16,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:00:17,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:00:18,180 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:00:18,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:00:20,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:00:20,676 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:00:21,831 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:00:22,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:00:23,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:00:24,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:00:25,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-28 17:00:26,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-28 17:00:27,176 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...